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Message from the General Chair 



On behalf of Kyushu University, the host of the 5th Intematioal Woi;king 
Conference on Visual Database Systems, I welcome you to the Conference 
and to the city of Fukuoka. 

Fukuoka has a population of over one million, and is the largest city in the 
Kyushu island, one of the four major islands in Japan. Because of its geograph- 
ical location, Fukuoka has long served as a gate way to Eurasia, and the area 
appeared first in a Chinese historic archive written around 200AC. Because 
of its closeness to Korea and to China, there is a long history of exchange of 
people with these countries. At Kyushu University, we are very happy to have 
many students not only from China and Korea but also from other Asiasn coun- 
tries. We are very happy to have you from around the world in this exciting 
and international city of Fukuoka. 

The first International Working Conference on Visual Database Systems was 
held in 1989 in Tokyo. I am very happy to welcome back its 5th Conference to 
Japan in this special year of 2000, the first year of the new milleniam and the 
last year of the 20 century. 

I am proud to say that we received many excellent papers and that each of 
the selected papers are of very high technical standards,deserving its place in 
the proceedings published in this special year. 

Visual information, the rnain topic of the conference, is found everywhere 
in our daily life. Digital tecnology enables us to store and manipulate visual 
data, such as photo, video and others. With the Internet, visual data is used 
as media for communication. Visual data is expected to play more and more 
important role in the digital information society of the future. 

A series of the Working Conference has contributed in enhancing the types 
of visual data stored in databases and ways to retrieve and display the data. I 
believe this Conference will spur further advancement in visual data technolo- 
gies. 

I thank the Program Co-chairs, Hiroshi Arisawa and Tiziana Catarci, for 
their outstanding work. Also many thanks go to Stefano Spaccapietra, the for- 
mer General Chair and to Yahiko Kambayashi, Professor of Kyoto University, 
for their help in organizing the Conference. Last but not least I thank many 
other people who have worked long hours and contributed to the success of 
this conference. 



Akifumi Makinouchi 
General Chair 




Message from the program Co-Chairs 



The Fifth IFIP 2.6 Working Conference on Visual Database Systems (VDB5) 
is held in Fukuoka, Japan, May 10-12, 2000. It follows the tradition of past 
VDB meetings in bringing together researchers, practitioners, and developers, 
to explore new concepts, tools, and techniques for both visual interfaces to 
database systems and management of visual data. It provides intensive discus- 
sions for original research contributions and practical system design, imple- 
mentation, and evaluation. 

Responding to the call for papers, we received thirty four (34) papers. Each 
paper was carefully reviewed by at least three members of the program com- 
mittee (PC), and then discussed on the Internet between the reviewers. At the 
final PC meeting held in Kyoto, Japan, the committee decided to accept eigh- 
teen (18) papers for presentation during the regular sessions of the conference, 
and four (4) of the remaining submissions were also selected to be presented 
as short presentations. In additions, one panel was accepted, strengthening 
conference’s program and bringing greater diversity to its topics. 

The conference also features two invited lectures by recognized leaders in 
the fields of user interfaces and multimedia database systems, respectively. 
These are "hot" topics within the main themes of the conference and invited 
lectures are intended to lay the seeds for fruitful discussions on the future de- 
velopment of visual information development. 

It is our belief that through the conference program all participants can have 
a good opportunity to discuss and review the new trends in next-generation 
database systems, and to achieve deeper understanding of the technological 
foundations behind them. 

The program committee would like to thank all those who submitted papers, 
panel or other proposals, and all contributors who supported the conference 
program. In turn, we would like to express our deep appreciation to all mem- 
bers of the program committee for their very hard work and dedication, and 
especially for their efficiency and effectiveness during the electronic interac- 
tions over the network. 



Hiroshi Arisa>va 
Tiziana Catarci 
Program Co-chairs 
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CONSTRUCTION OF THE MULTIMEDIA 
MEDIATION SYSTEMS 



Masao Sakauchi 

Institute of Industrial Science, University of Tokyo 
7-22-1 Roppongi, Minato-kil, Tokyo, 106-8558, Japan 
sakauchi@sak.iis.u-tokyo.ac.jp 

Introduction 

Rapid expansion of multimedia information space based on video or 
image data is being realized by means of various distribution tools. Three 
types of multimedia information spaces (or environments), i.e. “in the 
real-world”, “in the digital broadcasting stream” and “on the network” 
are especially promising. On the other hand, from social and economic 
viewpoints, importance of information processing techniques which can 
create real value for human activity or life should be surely recognized. 

Considering these two backgrounds, we are now developing a new 
multimedia database system, named the Multimedia Mediation Systems 
as application-oriented middleware for realization of functions, services 
demanded by human and society. 

In the speech, the framework of the Muletimedia Mediation Systems, 
the basic functions for realizing three types of concrete Multimedia Me- 
diation Systems, i.e. Real-world type multimedia System, Stream type 
Multimedia System and Network type Multimedia System will be dis- 
cussed with several embodiments, mainly based on our research project 
(http://shinpro.sak.iis.u-tokyo.ac.jp/index-e.html). 
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Real-world MM Mediation functions 




Figure 1 Basic mediation functions for Real-world type MMS 



1. THREE TYPES OF '^MULTIMEDIA 
ENVIRONMENT” 

< Real-world type MMS> 

Many promising ITS (Intelligent Transport System) applications re- 
quire realtime acquisition of traffic condition in the roads. The earth- 
quake at Kobe in 1995 taught us the importance of realtime acquisition 
of our city information for disaster mitigation. On the other hand mul- 
timedia communication technology enables us to establish a new type 
of database which collects and analyzes realtime situations (video infor- 
mation) to tell us “What’s going on in the city” on realtime basis. Let’s 
call such type as “Real-world type MM Environment” . 

Though this type of systems generally have not been considered as 
databases system, they have promising possibility to realize new ser- 
vices and business with realtime integration of real-world situations and 
existing another databases. 

The first our target is the Real-world type Multimedia Mediation Sys- 
tem to cope with these situatoins. Fig.l shows basic mediation functions 
for the Real-world type MMS. The target multimedia data include var- 
ious data from robot cameras, mobile units or network sites, reflecting 
realtime situations in the real-world, such as town scenes or traffic on 
the roads. ITS, various applications for town life, social security systems 
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are examples of target applications. Following concrete research results 
will be presented in this talk: 

■ Development of event discovery /sit nation understanding by real 
world video data 

■ Development of Precise realworld mediation functions 

■ Development of Wide area realworld mediation functions 

■ Domain-oriented applications (ITS, GIS etc.) 

< Stream type MMS> 

Needless to say, one of the typical leaders of multimedia informa- 
tion (contents) providers is broadcast combined with communication. 
Commercial-based satellite digital broadcasting with over hundreds of 
channels have already started in USA, Japan, Europe and Asia. An- 
other digital broadcasting in the form of ordinary surface wave TV, or 
Fibernet communication or even on the Internet also have already been 
or will be started. In such situations, where we’ll be able to enjoy hun- 
dres or thousands of broadcasting channels, much more user-oriented 
and intelligent access to such tremendous amount of “contents stream” 
will be required. 

Basic mediation functions for this Stream type MMS are shown in 
Fig.2. In this case, target multimedia data include video stram in digital 
broadcasting, video contents etc.. New interactive video services and 
personal media services are examples of target applications. Following 
concrete functions will be presented as this embodiment. 

■ Proposal and Realization of the framework of MM data description 
and Utilization 

■ Automated describing functions of video streams 

■ Data collaboration/Data Retrieval/Event discovery for video stream 

■ Creation of new interactive broadcasting services and applications 

■ High performance architectures for huge multimedia data space 

<Network type MMS> 

The third target is mediation functions for the Network type MMS 
for the WWW space in a Internet. In this case, mediation functions for 
realizing advanced search engine or mediation for solution include Event 
discovery (data mining), data retrieval, data collaboration or interface 
for mediation. Some examples will be presetned in this talk. 
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Video Stream Space 



Digital broadcasting 
Video contents 
Video 

Real-world images 




• • • 



Mediation functions 

• Event discovery, 

Multimedia data mining 

• Stream data description 

• Data integration/organization 

• Data retrieval 

• Data creation and presentation 



Application 



Contents creation/processing 
Service creation 



New interactive video service 

Personal media service 

• • • 



Figure 2 Basic mediation functions for Stream type MMS 
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A NEW ALGEBRAIC APPROACH TO RETRIEVE 
MEANINGFUL VIDEO INTERVALS FROM 
FRAGMENTARILY INDEXED VIDEO SHOTS* 



Sujeet Pradhan Takashi Sogo 

Kurashiki Univ. of Science & the Arts Kobe University 

sujeet@soft.kusa.ac.jp sogoh@db.cs.kobe-u.ac.jp 



Keishi Tajima 
Kobe University 
tajima@db.cs.kobe-u.ac.jp 



Katsumi Tanaka 
Kobe University 
tanaka@db.cs.kobe-u.ac.jp 



Abstract Video data consists of a sequence of shots. Over the past several years, substan- 
tial progress has been made in automatically detecting shot boundaries based 
on changes of visual and/or audio characteristics. There has also been consider- 
able progress in indexing such video shots by automatically extracting keywords 
using techniques such as speech and text recognition. Shots detected by those 
techniques, however, are very fragmental. A single shot itself is rarely self- 
contained and therefore may not carry enough information to be a meaningful 
unit. A meaningful interval that interests common users generally spans several 
consecutive shots. There hardly exists any reliable technique for identifying all 
such meaningful intervals in advance so that any possible query can be answered. 

In this paper, rather than identifying meaningful intervals beforehand, we 
shift our focus on how to compute them dynamically from fragmentarily indexed 
shots, when queries are issued. We achieve our goal by using two techniques — 
glues and filters. Glues are algebraic operations for composing all the longer in- 
tervals, which can be meaningful answers to a given query, from a set of shorter 
indexed shots. Glue operations do not count on any limit to the length of result- 
ing intervals. Consequently, lengthy intervals containing several irrelevant shots 
are also expected to be composed as possible answers. Therefore, we provide fil- 
ter functions so that such lengthy intervals are excluded from the answer set and 
only few relevant intervals are returned to the user. Both glues and filters possess 
certain algebraic properties that are useful for an efficient query processing. 

Keywords: video query model, interval query, glue operations, interval filters 



*This work is supported partly by the Japanese Ministry of Education under Grant-in-Aid for Scientific 
Research on Priority Area: “Advanced Databases”, No. 08244103 and partly by Research for the Future 
Program of ISPS under the Project “Researches on Advanced Multimedia Contents Processing”. 
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1. INTRODUCTION 

Video segmentation is the most fundamental process for appropriate index- 
ing and retrieval of video intervals. In general, video streams are composed 
of shots ^ delimited by physical shot boundaries. Substantial work has been 
done on how to detect such shot boundaries automatically (Arman et al., 1993) 
(Zhang et al., 1993) (Zhang et al., 1995) (Kobla et al., 1997). Through the inte- 
gration of technologies such as image processing, speech/character recognition 
and natural language understanding, keywords can be extracted and associated 
with these shots for indexing (Wactlar et al., 1996). A single shot, however, 
rarely carries enough amount of information to be meaningful by itself. Usu- 
ally, it is a semantically meaningful interval that most users are interested in re- 
trieving. Generally, such meaningful intervals span several consecutive shots. 

There hardly exists any efficient and reliable technique, either automatic or 
manual, to identify all semantically meaningful intervals within a video stream. 
Works by (Smith and Davenport, 1992) (Oomoto and Tanaka, 1993) (Weiss 
et al., 1995) (Hjelsvold et al., 1996) suggest manually defining all such inter- 
vals in the database in advance. However, even an hour long video may have 
an indefinite number of meaningful intervals. Moreover, video data is multi- 
interpretative. Therefore, given a query, what is a meaningful interval to an 
annotator may not be meaningful to the user who issues the query. In practice, 
manual indexing of meaningful intervals is labour intensive and inadequate. 

Some efforts have been made in automatically detecting and indexing mean- 
ingful intervals in advance for retrieval (Wactlar et al., 1996) and for browsing 
(Yeung et al., 1996). The former one (Wactlar et al., 1996) decomposes a video 
stream into paragraph units which they consider to be pre-defined answers for 
pre-defined queries. The latter one (Yeung et al., 1996) identifies story units 
within a video stream on the basis of visual similarity and temporal locality 
relationship among video shots. Although they are successful to some extent, 
there still remains the problem of discrepancies between the granularity of the 
answer intervals that have been detected and the granularity of the intervals 
that end users expect to retrieve. It is because unless a user issues a query, it is 
not clear what meaningful intervals are to be identified beforehand. 

Suppose we have a live video stream of a baseball match, which has been 
segmented into shots {si, s2, . . . , sn} and indexed (Figure 1). 

□ s3 shows Matsui (a Japanese baseball player) preparing to face the next pitch. 

□ s4 is the shot of Matsui hitting the ball. 

□ 55 is the shot of the ball clearing the fence. 

□ 50 shows a glimpse of spectators cheering. 

□ s7 shows Matsui completing the run and touching home plate. 



^ A shot is a continuous sequence of frames captured from a single camera with no shutter interruption 
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Now let us consider this query: “retrieve a video interval which shows 
Matsui hitting a homer”. Rarely would a user hope to retrieve only shot s4, 
which shows the actual moment of Matsui hitting the ball, or only 55 that 
shows the ball crossing the fence. It is clear that at shot level, there will be 
only fragmentary answers to this query. A user would hope to see, at least, the 
interval starting from the shot 54 till the end of 55. It is difficult to identify 
the exact answer interval to such queries beforehand. There are many intervals 
(intervals represented by s3-s7, s3-s5, s4-s5, s4-s7 in Figure 1) that can be con- 
sidered meaningful and thus should be returned as answers to the above query. 
Only the query issuer knows which is the best among these four intervals for 
him or her. 

This paper takes a different approach towards answering keywords-based 
video queries. We assume that video streams are segmented only at shot level 
since that is the best we can do with the present technology. We also assume 
that keywords are associated fragmentarily with these shots. Generally, we 
are less interested in what kinds of queries are going to be issued and what 
intervals are to be indexed to answer such queries. Instead, we focus our work 
on how to compose intervals that could possibly be the answers to a given set 
of keywords. In order to do so, we define a set of new algebraic operations, 
what we call glue operations that dynamically composes answer intervals from 
a set of indexed shots. Intuitively, these operations will enable us to compose 
all possible answer intervals to a given query; first by selecting the valid video 
shots for each query keyword and then by gluing those shots together. It should 
be noted here that our approach to retrieving meaningful intervals is syntactic 




s3-s7 s3-s5 s4-s5 



Matsui touches home plale 
Spectators cheer wiidly 
backscreen homer 

Vamada pitches a straight ball and 
Matsui hits beautifully 

Matsui preparing for the ne xt pitch 



Figure 1 Video stream segmented into shots and indexed 
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rather than semantic.^ The glue operations possess certain algebraic properties 
which allow a simple and efficient composition of answer intervals. 

As we have no prior knowledge about the boundaries of meaningful inter- 
vals, glue operations will go on composing longer intervals as far as valid shots 
associated with the query keywords are found for gluing together. In the worst 
case, if two query keywords happen to fall at the beginning and at the final 
shot of the video stream, there is always a possibility of returning the full one 
hour video as a valid answer. However, lengthy intervals containing plenty 
of unnecessary shots are usually irrelevant to users and as many of them as 
possible should be excluded from the answer set. We propose filtering tech- 
niques for discarding answer intervals that can be thought of irrelevant to users. 
Intuitively, we provide a filter function that takes a set of video intervals as in- 
put, and returns the subset that meets some necessary conditions for a query 
match. Importantly the proposed filter functions also possess certain algebraic 
properties and can be well-integrated with the glue operations. As a result, 
considerable number of irrelevant intervals can be removed at the initial stage 
of query processing, which eventually leads to a groundwork for an efficient 
query mechanism. The detailed explanation is given later. 

2. INTERVAL COMPOSITION TECHNIQUES 

A great deal of work has been done in the past for composing new video 
intervals from a set of intervals. Various interval operations such as union, in- 
tersection, concatenation and their set- variants have been defined and redefined 
in (Oomoto and Tanaka, 1993) (Weiss et al., 1995) (Hwang and Subrahmanian, 
1996) (Hjelsvold et al., 1996). However, given a set of fragmentarily indexed 
video shots, these operations generally produce fragmentary intervals and thus 
cannot always produce appropriate intervals which users generally intend to 
find in the first place. 

2.1, QUERY INTERPRETATION 

Let us again consider the query “retrieve a video interval which shows 
Matsui hitting a homer (Figure 1). In conventional approaches, a query result 
is computed by simply taking the intersection between those video intervals 
with an attribute value Matsui and those with an attribute value homer. An 
actual scene of ‘Matsui hitting a homer’, however, usually consists of sev- 
eral shots. It is primarily because video productions involve lots of switching 



^In other words, we do not consider the semantics in the keywords themselves. For example, a keyword 
like bat may mean either a bird or a wooden stick used in sports. Nor do we consider any semantics in the 
way they are ordered. For example, a query like {dog, run, man} may retrieve intervals showing not only 
“a dog running after a man” but also “a man running after a dog”. 
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between the cameras, camera movements, zooming, and panning. It rarely 
happens that each shot in that scene contains both Matsui and homer (see 
Figure 1). Some of them may show only either of them, and there may even 
be a filler shot showing none of them but simply a glimpse of spectators. It 
should be noted that such filler shots are even more common in edited motion 
pictures. Therefore, the user who issues this query in the first place does not 
necessarily expect a video interval that shows Matsui and/or homer in each 
shot throughout its play. Intuitively, the above query can be interpreted as: “re- 
trieve a contiguous video interval in which each keyword Matsui and homer 
emerges somewhere at least once”, or a sequence of shots in which each key- 
word emerges in at least one shot. 

Producing appropriate intervals from the actual video data to answer even 
such a simple query asks for new interval operations. Our goal is to develop 
such a set of operations that can compute answers to queries on a video data- 
base. The database simply contains video shots that are fragmentarily indexed 
by descriptive keywords. 

2 . 2 . VIDEO INTERVAL 

A video interval^ is a stream of contiguous frames and is uniquely defined 
by a pair of frame numbers — starting frame number and ending frame number 
which are represented by fs and fe respectively. We write fs{i) and fe{i) to 
indicate the starting frame number and the ending frame number respectively 
of an interval i, where fs < fe- An interval is denoted by i[/s, fe], or simply 
by i whenever [/§, fe] can be omitted. 

A video interval is indexed by a set of keywords {A:i, ^ 2 , • • • , ^n}- To a 
query keyword k, valid shots are the ones which are associated with the key- 
word k. 



2 . 3 . GLUE OPERATIONS 

Here, we present formal definitions of our interval operations required to 
compute all possible answer intervals to a keywords-based query. Their prop- 
erties will clearly reflect the query semantics that we informally stated above. 

2.3.1 Interval glue. Given two video intervals x and y, the operation 
interval glue (©) on these two intervals yields a single interval i as follows: 

x®y = i[fs,fe] where 

fs = min{fs{x), fs{y)) and 

/e = max{feix),fe{y)) 



shot is also a video interval, but not necessarily a meaningful interval. Interval is used instead wherever 
no distinction is necessary. 
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Figure 2 Interval Glue between x and y 



Thus supposing there are two video intervals x[100, 160] and ^[180,210] 
then X 0 y = i[100, 210] (Figure 2). 

The basic idea of this operation is that even if two input intervals are not 
abutted, the resulting interval will be contiguous. This operation is important 
because of our assumption that keywords are associated with only fragmentary 
intervals. Moreover, it is clear from the example above that related shots are 
often separated by filler shots, which may not be indexed by the keywords that 
appear in the query. Readers must note that the concatenation operation be- 
tween two non-contiguous intervals, which is used in many existing researches, 
does not produce a contiguous interval thus losing the original context of the 
data. Our interval glue operation is more appropriate than the concatenation 
operation in the context of video interval composition. 

Some important algebraic properties of interval glue operation are: 

■ Commutativity: x ® y — y ® x (by the definition of interval glue) 

■ Associativity: {x ® y) ® z = x ® [y ® z). Hence, hereafter we write 
{x®y)®z = x®y®z (proof omitted for space reason) 

■ Idempotence: x ® x = x (by the definition of interval glue) 

2.3.2 Pairwise glue. This is the set-variant of interval glue operation. 
Given two sets of video intervals X and Y, the operation pairwise glue (0) 
returns a set of video intervals yielded by pairwise interval glue operation (©) 
between the elements of the two input sets. (See Figure 3) 

X@Y = {x®y\xEX and y ^Y} 

For a given set of intervals X — {ii, . . . , in}(^ > 1), 0(X) denotes i\ © 
... © in. This notation will be used in the definition of powerset glue operation 
below. 

The pairwise glue has the following algebraic properties. 

■ Commutativity: X 0 F = Y @X (by the definition of pairwise glue) 

■ Associativity: (X 0 F) 0 Z — X 0(1^ 0 Z) (proof omitted) 
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However, the following example clarifies that the idempotence law does not 
hold. Suppose, X = [h.h]- Then (X0X) = {(h © zi), (22 © 22 ), (n © 
^ 2 )} = {u? ^ 2 , {h ® ^ 2 )} which is different from X. Hence, X 7 ^ (X 0X). 

It should also be noted that (X0X) = (X0X0X) = (X 0 . . . 0X). 
This property is used in transformation of the powerset glue operation, which 
will be explained in the following subsection. 

Supposing Ski and Sk 2 are two sets of shots associated with the keywords 
ki and k 2 respectively. Assuming that each shot does not contain both the key- 
words, then Ski 0 Sk 2 represents a set of intervals such that for any keyword 
^ 1 ,^ 2 , each interval will contain exactly one shot. 




Figure 3 Pairwise Glue Operation Figure 4 Powerset Glue Operation 

between X and Y between X and Y 



2.3.3 Powerset glue. Given two sets of video intervals X and F, the 
operation powerset glue ((g)) returns a set of video intervals. These intervals 
are yielded by applying interval glue operation (©) to an arbitrary number (but 
not 0) of elements in X and Y. 

X(g)F = {0(X' U Y') I X' C X, F' C F, 

XV 0 and F' + 0} 

In Figure 4, intervals 21,. . . ,24 are yielded by applying interval glue opera- 
tion on pairs of intervals; each pair consisting of one element from both X and 
F. The interval 25 is yielded by the same operation on a set of intervals, the 
set consisting of two elements from X and either one or two elements from F. 
We may also consider intervals produced by taking two elements from F and 
one from X, but the results will be the same as 22 and 23. 

The powerset glue operation between X and F is formulated as: 

X0F = (X0F) U (X0X0F)U 

(X0F0F) U (X0X0X0F)U 

(X0X0F0F) U (X0F0F0F)U 
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The primary difference between the pairwise glue (0) and powerset glue 
(0) operations with two sets of intervals is that the former considers only 
one interval from each set whereas the latter considers one or more than one 
interval from each set. The powerset glue is the operation that we actually use 
for computing answer intervals to a query. It should be noted that the pairwise 
glue operation is not enough to compute all the possible answer intervals, since 
it considers only one interval from each set of valid shots. However, powerset 
glue is able to compute each possible answer interval, no matter how many 
valid shots are required to make up such an interval. 

Again, supposing Ski ^nd Sk 2 are two sets of shots associated with the 
keywords k\ and k 2 respectively. Then Ski 0 Sk 2 represents a set of intervals 
such that for any keyword ki, k 2 , each interval will contain one or more than 
one valid shots. 

It is obvious that the definition of powerset glue operation is complex. It will 
take an enormous amount of computation, especially when the number of inter- 
vals contained in X or y is large. However, the original definition of powerset 
glue can be transformed into a simpler and more efficient expression which 
involves only three pairwise glue operations. This is one big contribution of 
this paper. The following theorem states the newly transformed expression. 

Theorem 1 For any intervals sets X and Y, the following equation holds. 

x^Y = (x©x)0(y©y) 



Proof: See Appendix. 




Figure 5 Transformation of 
Powerset Glue definition into 
a simpler expression 



In Figure 5, the operations (X0X) and (y©y) produce two sets each 
consisting three intervals. Further pairwise glue operation on these two sets 
yields five intervals ii, . . . which is the same desired set of results (See 
Figure 4). 
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2.4. ANSWER TO A QUERY 

A query Q is represented by a set of keywords {fei, . . . , which is inter- 
preted as “retrieve contiguous video intervals in which each keyword . . .kn 
appear somewhere. The answer to this query will be a set of intervals — each 
interval containing at least one valid shot for every query keyword fci, . . . , 
Formally, it is represented as: 

where 5/ci • . . Skn the sets of valid shots for the query keywords ki . , .kn 
respectively. 

When a query is issued, valid shots are first selected for each query term. 
Then the operation powerset glue which we derived in Theorem 1 is applied to 
these sets of valid shots to produce answer intervals. If the query consists of 
more than two query terms, powerset glue is performed first between any two 
arbitrary sets of valid shots, which produces an intermediate set of intervals. 
The powerset glue operation is recursively performed between the intermediate 
set and each remaining set of valid shots until no sets of valid shots are left. 

Consider again the query Q = {Matsui, homer} that we mentioned in Sec- 
tion 2.1. Suppose, we have the following shots indexed by the corresponding 
keywords (See Figure 1 in Introduction). 

□ Z 3 [1350, 1429] {Matsui} 

□ 24 [1430, 1469] — ^ {Yamada, straight, Matsui, hit} 

□ 25 [1470, 1499] ^ {homer} 

□ 26 [1500, 1529] {} 

□ 27 [ 1530 , 1550] — > {Matsui, home plate} 

Supposing Ai and A 2 are the sets of valids shots for Matsui and homer 
respectively, then Ai = { 23 , 24 , 27 } and A 2 = { 25 }. Then the answer to this 
query will be: 

A = {237 [1350, 1550], 235 [1350, 1470], 245 [1430, 1470], 

247 [1430, 1550], 257 [1470, 1550]} 

Up to this point we have not considered any lengthy intervals that glue oper- 
ations produce if any temporally far-off shot associated with the keyword either 
Matsui or homer is available within the video stream. The following section 
explains how to avoid considering such intervals by using filter techniques. 

3. FILTERING TECHNIQUES 

One problem we encountered while defining our glue operations is the ex- 
istence of ‘noise’ within an answer interval. Intuitively, ‘noise’ is a single shot 
or a sequence of shots that cannot be matched with any of the terms appearing 
in a query. Existing interval operations such as intersection, union, concatena- 
tion defined in (Oomoto and Tanaka, 1993) (Weiss et al, 1995) (Hwang and 
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Subrahmanian, 1996) (Hjelsvold et al., 1996) do not offer adequate support for 
computing answer intervals containing ‘noise’. It is mainly because keywords- 
based queries are generally interpreted as — “In an answer interval, 

□ all query keywords must appear throughout it (AND-type), 

□ at least one of the query keywords must appear throughout it (OR-type)”. 

In the actual video, however, as we mentioned above, a meaningful interval 
(53 to s7 in Figure 6) may contain a short sequence that act only as a filler shot. 
If we strictly interpret a video query as an AND/OR-type query, an interval, 
even if it contains only a filler shot, will not be included in the query result. 
In practical applications of video databases, such a strict interpretation of the 
query makes little sense. 




Figure 6 A possible mean- 
ingful interval (s3-s7) con- 
taining a filler shot (s6) and 
an irrelevant interval (s5-s30) 
containing noisy shots 



3.1. IRRELEVANT INTERVALS 

Given a query, our initial goal is to compute as many considerable answer 
intervals as possible. While we have achieved this goal successfully by defin- 
ing the glue operations, we have also created a new problem for ourselves. Our 
assumption is that video streams are segmented only on the basis of physical 
shot boundaries and that we have no prior knowledge about the boundaries of 
meaningful intervals. As a result, under the pretext of composing answer inter- 
vals, the glue operations will go on considering longer intervals as far as valid 
shots associated with the query keywords are found for gluing together. In the 
worst case, if two query keywords happen to fall at the beginning and at the 
final shot of the video stream, there is always a possibility of returning the full 
one hour video as a valid answer. This will ultimately result in a large set of 
unwanted intervals. For example in Figure 6, an answer interval represented 
by 55 to 530 is one such unwanted interval to a query {Matsui, homer}, con- 
sidering that the keyword Mastui is associated with a distantly separated shot 
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s30 in the same video stream. An answer set should exclude as many unwanted 
intervals as possible if we are to achieve an effective query mechanism. 

3.2. INTERVAL FILTERS 

Here, we discuss our filtering techniques for discarding intervals that can 
be thought of irrelevant to a given query. Intuitively, we provide a filter func- 
tion that takes a set of video intervals as input, and returns the subset that 
meets some necessary conditions for a query match. As far as query process- 
ing is concerned, excluding irrelevant intervals only after computing all the 
possible answer intervals will have very little impact on query efficiency. A 
great amount of computation can be reduced if, at the initial stage of query 
processing, we can discard intervals that will eventually become useless for 
computing relevant answer intervals. In order to do so, the filter function must 
possess certain algebraic properties which we will describe below. Given that 
a filter function possesses such properties, it can be safely integrated with the 
glue operations. 

The simplified definition of powerset glue that we stated in Theorem 1 must 
be well-supported by the filter function. Formally, for any two interval sets X 
and Y, the following must hold: 

F(A 0 r) = F(F(A 0 X) 0 F(r 0 F)) 
where F is an interval filter function. 

If an arbitrary filter function holds this property, it will ensure that all the 
desired intervals will be included in the answer set. However, in order to ex- 
clude the irrelevant intervals at the initial stage of processing, the followings 
also must hold. 

F(x 0 = F(F(x) 0 F(y)) and 

F(X0F)-F(F(X)0F(F)), 
where F is an interval filter function. 

In the following sections, we describe two interval filters - time-window and 
maximal noise-width that are practically applicable to video queries. We will 
also show that the proposed glue operations can be well-integrated with these 
filters. 

3.2.1 Time- Window Filter. A video interval has a temporal duration 
that is generally expressed either in temporal unit such as seconds, minutes or 
in number of frames. It is natural for a query issuer to assume that meaningful 
intervals fall within a certain time-window such as 40 seconds or 1200 frames. 
Users should be able to specify such a time-window filter so that any answer 
interval longer than the specified duration will be filtered out from the response 
set. Below, we will show that time-window filter not only excludes unwanted 
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intervals from the answer set but also reduces the number of candidate intervals 
to be considered even before composing the answer intervals. 

Suppose I i I denotes the temporal duration of an interval, i.e. | i | = 
/e(i) — fs{i)- We define a time-window filter as a mapping from a set of 
intervals X to a subset of X. We first define for each interval i 'm X as: 

F (i) = I if M I < 

^ \ undefined, otherwise 

where u; is a specified time-window. 

For any interval set X, we extend this definition to the following: 

Fw(X) — {i I i G X and | i | < w}. 

3.2.2 Maximal Noise- Width Filter. As mentioned above, while a 
short ‘noise’ that act only as filler shots within a meaningful interval are sig- 
nificant for its semantic continuity, a long ‘noise’ can simply be thought of as 
a sequence of shots that splits off intervals which are temporally far-off. Here, 
we give the formal definition of a ‘noise’ and provide a new interval filter for 
discarding intervals containing a long ‘noise’. 

For a keyword k, we define ‘noise’ as a set of intervals in which the key- 
word k does not appear. Supposing X is the set of all the intervals where the 
keyword k appears and U is the set of all the subintervals of the video data in 
the database. Then, for the keyword fc, noise is computed as: 

Noise{k) — max(X) where 

X = {ieU\ i) G X){i' n i = 0)} and 
max(Z) — {i\i E Z and 

(Vz'(i' i) G Z)(i D i' or i Pi i' = 0)} 

For a set of keywords K = {/ci, /c 2 , . . . , kn}, ‘noise’ is defined as a set of 
those intervals in which none of keyword {ki^k 2 ^ • • • ^kn} appear at all. In 
order to compute the ‘noise’ for a set of keywords, we need to provide the 
usual definition of interval intersection operation and its set-variant. 

Given two video intervals ii and Z 2 , the operation interval intersection (O) 
on these two video intervals yields a single video interval i as follows: 

iiQi2 = i[fs, fe] where fs < fe and 

fs = max{fs{ii), fsik)) and 

fe = m^n(/e(^■l),/e(^2)) 

Thus supposing there are two video intervals ii[10, 20] and i2[15,40] then 
ii O *2 = *[15, 20], 

The interval set intersection (Q) operation on two sets of video intervals X 
and Y returns a set of video intervals constituting the pairwise intersection (0) 




A New Algebraic Approach to Retrieve Meaningful Video Intervals 23 



between the elements of the two input sets. 

XQY = {xQy\x^X and y eY} 




Figure 7 Definition of 
‘Noise’ 



Now, supposing Xi,X 2 , . . . are the sets of intervals associated with 
the the keywords fci, ^ 2 , . . . , respectively, then the ‘noise’ for such a set of 
keywords is defined as: 

Noise{K) — Noise{k\) O . . . O Noise{kn) 

= max(Xi) O • • • O max(Xn) 

As it is obvious that, to a given query consisting of a set of keywords, ‘noise’ 
can be easily computed by applying these definitions. For example, in Figure 7, 
X = {x\ [30, 50], X 2 [90, 120]} is a set of intervals associated with the keyword 
kx. Similarly, Y — {yi[40, 60], y2[70, 80]} is another set of intervals associ- 
ated with the keyword ky. ‘Noise’ for each kx and ky can be computed by 
Noise{kx) and Noise{ky) respectively as shown in the figure. By perform- 
ing interval intersection operation on these two sets of intervals, we get the 
set of noisy intervals (containing ‘noise’) Noise{K) for the combined set of 
keywords K = {kx^ky}. 

For a set of keywords K, we can easily compute ‘noise’ contained in an 
arbitrary interval i by applying the following. 

AT^(z) = {i}QNoise{K) 

For a set of keywords K, Figure 7 shows the ‘noise’ Nxii) contained in an 
interval i = xi ® y 2 , which is an intersection operation between the interval i 
and each interval of the set Noise{K). 

Based on this definition of ‘noise’, we now define a new filter. First, we 
specify N as the maximal noise-width that can be allowed in an answer inter- 
val. N is expressed in terms of temporal duration such as seconds or number 
of frames. Then for a query expressed by a set of keywords K, we define a 
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maximal noise-width filter Fn,k as a mapping from a set of intervals X to a 
subset of X. We first define k for each interval i in X as: 

, - _ J i, max{ 1 i' I | i' G < N- 

^ \ undefined, otherwise 

where X is a specified maximal noise-width. 

Again, for any interval set X, we extend this definition into the following: 

Fn,k(X) = {z I 2 G X and max{ \ i' \ | i' G Nxii)} < N} 

3.2.3 Glue Operations and Filter Functions. As stated above, in or- 
der to enable us to incorporate a filter function easily in our query mechanism, 
it must possess certain algebraic properties. The following theorem states the 
proposed filter functions can indeed be integrated with powerset glue operation 
for computing desired answer intervals to video queries. 

Theorem 2 For any interval sets X and Y, the following expression holds: 

F{X (S)Y) = F(F(X 0 X) © F(Y © Y)) 

where F is either a time- window (Fw) or a maximal noise- width (Fn,k) filter. 
Proof: See Appendix. 

The following two lemmas present some fundamental properties which pro- 
vide key leverage for deriving Theorem 2. 

Lemma 1 For any two intervals x and y, the interval glue operation has the 
following property. 

F(x 0 y) = F(F(a:) 0 F(y)), 

where F is either a time-window (F^) or a maximal noise-width (F^ k) filter. 
Proof: See Appendix. 

Lemma 2 For any two sets X and Y whose elements are indexed video inter- 
vals, the pairwise glue operation has the following property. 

F(X©r) -F(F(X)©F(r)), 

where F is either a time-window (F^) or a maximal noise-width (Fn^k) filter. 
Proof: See Appendix. 

Despite its computational complexity as compared to time-window, maxi- 
mal noise -width filter is one natural way of reducing the over-populated answer 
set. In practical applications, unlike time-window filter, a high value for max- 
imal noise-width filter can be assumed implicitly by the system. It is because 
we are concerned to filter out only those intervals which contain considerable 
length of ‘noise’. 
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Theorem 2 is applied to compute desired answer intervals to a query while 
filtering out irrelevant intervals from the answer set. In Lemma 2, we showed 
that F{X 0X) = 0F(X)). One big advantage of this property is 

that the number of candidate intervals to be considered are greatly reduced even 
at the initial stage of query processing by applying mapping functions F(X) 
and F(y) with the time-window filter. Consequently, an efficient query mech- 
anism can be achieved by integrating filter functions with our video queries. 

4. EXPERIMENTAL EVALUATION 

We carried out an experiment to evaluate two performance gains. The first 
performance gain we anticipated was as an effect due to the transformation 
of the powerset glue operation into three pairwise glue operations. Theoret- 
ically, if X — {xi, . . . , Xm} and Y — {yi, . . . , are two sets of inter- 
vals, the time complexity of powerset glue operation X becomes 2^^^. 
However, the time complexity of the transformed powerset glue operation 
(X 0 X) 0(y 0 Y) is simply n^m^. 

Naturally, we observed a very large performance gain from transformed def- 
inition of powerset glue operation. Given any query, it showed that the ratio of 
the time taken to compose the answer intervals without transformation to the 
time taken after the transformation was very large. 




Figure 8 A prototype system im- 
plemented in Java. A five minutes 
long MPEG video data of a live 
baseball match was used as exper- 
imental data. There were 31 shots 
fragmentarily indexed by 1 3 unique 
keywords. 



The second performance gain expected was as a result due to the application 
of the filter function to the queries. Although there was no reduction in time 
complexity of powerset glue operations, we could expect a considerable per- 
formance gain by reducing the candidate intervals before performing powerset 
glue operations on them. The experiment showed that the ratio of the time 
taken to compose the answer intervals without filter functions to the time taken 
after applying the filter functions was also substantially large. However, we 
observed that this type of performance gain totally depended upon the num- 
ber of candidate intervals that could be discarded at the initial stage. Further 
experiments are necessary to establish a ground theory for benchmarking our 
query mechanism. 
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5. RELATED WORK 

There is now growing interest in querying the large resources of digital 
video data. Allen’s work on temporal intervals (Allen, 1983) laid the foun- 
dation for many researches concerned with time intervals (Little and Ghafoor, 
1993) (Lorentzos and Mitsopoulos, 1997). He showed that there are 13 dis- 
tinct temporal relationships that can exist between two arbitrary time intervals. 
Some researches on video databases have been greatly influenced by Allen’s 
temporal model. In (Little and Ghafoor, 1993), temporal-interval-based mod- 
els have been presented for time-dependent multimedia data such as video. 
However, our work is orthogonal with the ones based on temporal logic. We 
were rather interested in the semantics of a keywords-based video query and 
focused our work on the operations required for synthesizing new intervals in 
order to answer such queries. As we saw in the example above, in any mean- 
ingful video interval, the same keyword may emerge for an indefinite number 
of times in no particular order. Further investigation is required if we are to in- 
tegrate the query condition specifications based on their temporal relationships 
into the current query mechanism"^. 

Work by (Oomoto and Tanaka, 1993) (Weiss et al., 1995) (Hwang and Sub- 
rahmanian, 1996) (Hjelsvold et al., 1996) put emphasis on annotation model 
rather than on query model. As stated above, no matter what the annotation 
models are, defining all possible interval answers in advance in the indexing 
scheme itself is not feasible with the present technology. Although they have 
defined several interval operations to compute new intervals from existing in- 
tervals, they all lack the kind of algebraic operations required for our purpose. 

Informedia project (Wactlar et al., 1996) uses the full text information re- 
trieval system based on well-known technique of tf/idf (term frequency/inverse 
document frequency) for keywords-based queries on video databases. How- 
ever, it also presumes the answer intervals in terms of the granularity of the 
indexed units, or what is called video paragraphs. 

The idea of assuming two visually similar shots as components of different 
story units on the basis of temporal locality was first proposed in (Yeung et al., 
1996). They have applied this idea to clustering video shots that are visually 
similar and temporally local. The basic assumption is that if two visually simi- 
lar shots do not fall within a certain time window, they can be considered as the 
shots from different story units. Our maximal noise-width filter is somewhat 
inspired by this concept. 



^The result of our preliminary work on an algebraic video query model is going to be published in the 
journal of IEEE Transactions on Knowledge and Data Engineering (Pradhan et al., 2000). Our recent work 
deals with temporal filters which allow us to specify temporal relationships between keywords in a video 
query. 
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6. CONCLUSIONS 

In any video database system with fragmentarily indexed intervals, end- 
users often have difficulty in retrieving the intervals that they desire to see. This 
is because the intervals they are hoping to find may not have not been defined 
as answer units in the database. In such cases, intervals need to be computed 
dynamically from the indexed units that currently exist in the database. 

Algebraic operations such as union, intersection, concatenation does not al- 
ways produce the desired answers since these operations do not consider the 
presence of ‘noise’ in a meaningful interval, which is so natural and common 
in video data. In order to compute interval answers to a video query repre- 
sented by a set of keywords, we defined a set of interval operations, called glue 
operations, which is the main contribution of this paper. Given a query, these 
operations enable us compute all the possible boundaries for interval answers 
from a set of stored video shots. 

We also proposed a set of interval filters that can be incorporated in video 
queries so that irrelevant answer intervals are excluded from the answer set. 
We then investigated the characteristics of each filter and found that the filters 
can be directly integrated into the proposed powerset glue operation. As a 
result, a considerable number of irrelevant intervals can be removed at the 
initial stage of query processing, which has led to a groundwork for an efficient 
query mechanism. 
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Appendix 

Theorem 1: For any intervals sets X and Y, the following expression holds. 

x(^Y = (x©x)0(y0y) 

Proof: Supposing U = X(^Y and V = (X0X)0(y0y), then to prove U = 
need to show that U ^ V and U C V. First to prove, U ^ V, consider an arbitrary element 
z e V. Then, there must be x G (X 0 X) and y G (y 0 y) such that x ^ y = z. By the 
definition of interval glue (0), the following four possible cases can be considered on the basis 
of what values 2 : takes for its starting and ending frame. 

1. Z = z[fs{x),fe{y)\ iUs(x) < fs{y) and/e(y) > fe{x) 

II. z = z[fs{y)Je{x)] if fs{y) < fs{x) and/e(a;) > fe{y) 

III. z = z[fs{x)Je(x)] if fs{x) < fs{y) and/e(x) > fe(y) 

IV. z = z[fs(y)Je{y)] if fa{y) < fs{x) and fe{y) > fe(x) 

We will first consider Case I. Since X G (X 0X), there must be x' G X such that /s(x') = 
fs{x) and fe{x') < fe{x). Similarly, since y ^ (Y 0y), there must be y' ^ Y such that 
fe{y') = fe{y) and fs{y') > fs{y)^ Now, since U is the powerset glue of X and y, there must 
be z' G t/ such that z' = x' 0 ^ . 

Since fs{x) fs{x), fs{x) < fs{y) and fs{y') > fs{y), we can conclude fs{x) < 
fs{y')- Similarly, since fe{y') = fe(y), fe{y) > fe{x) and /e(x') < fe{x), we can conclude 
fe{y') > fe{x'). Therefore, z' = z'[fs{x')Je{y')]- Since /^(x') = fs{x) and fe{y') 
fe{y), we can say that z' = z. Therefore, there must be z G f/ too. Refer to Fig. 9. 

Similar proof can be shown for Case II. The proof for Case III can be shown by considering 
two intervals in X and one in Y (Refer to the right hand side of Fig. 9). Case IV can be proved 
in a similar manner. Hence, U D V. 

Next we show that U C V. Again, consider an arbitrary element z ^ U. We know, U is 
the product of powerset glue operation between X and Y . Hence, we can write z = x 0 y 
where x and y are yielded by performing pairwise glue operation on one or more elements of 
X and y respectively. Here also, by the definition of interval glue (0), four possible cases can 
be considered on the basis of what values z takes for its starting and ending frame. 

z = z[fs{x),fe{y)] if fs{x) < fs{y) and fe{y) > fe(x) 

II. 2 = z[fs(y), fe{x)] if fs{y) < fs{x) and fe{x) > fe{y) 

III. 2 = z[fs{x), fe{x)] if fs{x) < fs{y) and fe{x) > fe{y) 




Figure 9 Illustration for the Proof of Theorem 1 
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IV. 2 s z[fs{y), fe{y)] if }s{y) < fs{x) and fe{y) > fe{x) 

We will first consider Case I. Since x is yielded by performing pairwise glue operation on 
one or more elements of X, there must be x' G X such that fs{x) — fs{x) and fe{x') < 
fe(x). Similarly, since y is yielded by performing pairwise glue operation on one or more 
elements of Y, there must be G F such that fe{y') — fe{y) and fe{y') > fe{y)‘ 

Again, since X is a. pairwise glue operation between X and X, any arbitrary element 
in X will definitely be in X^X too. Hence, x' G (X ^X). Similarly, y' G (Y ® ^). 
Therefore, there must be a z' G such that z' = x' ^y\ 

Since fs(x') = fs(x), fs{x) < fs(y) and fs(y') > fs{y). we can conclude fs{x') < 
fs{y'). Similarly, since fe{y) = fe{y), fe{y) > fe{x) and fe(x') < fe{x), we can conclude 
fe{y') > fe(x'). Therefore, z' = z'[fs{x'), fe(y')]- Since fs(x') = fs{x) and fe(y') = 
fe{y), we can say that z' == Therefore, there must be z e V too. 

Similar proof can be shown for Case II. The proof for Case III can be shown by considering 
two intervals in X and one in Y, Case IV can be proved in a similar manner. Hence, U C V. 
This completes the proof. □ 

Lemma 1: For any two intervals x and y, the interval glue operation has the following property. 

F(a; © 2/) — F(F(x) 0 ¥{y)), 

where F is either a time- window (F„) or a maximal noise- width (Fn,k) filter. 

Proof: Let W denotes either the specified time-window filter width w or the specified maxi- 
mal noise-width filter N. Also, let | z | denotes the temporal duration of an interval i in the case 
of time-window filter and the maximal noise it contains in the case of maximal noise-width filter. 
The proof is obvious if | a; | < W and \ y\ <W. However, if | a: | > W ox \ y\ > W, 

then F(F(a;) ©F(y)) = undefined. We know that | {x^y) | > \ x\ , \ y\ . Hence, it is obvious 

that I (a: © z/) I > W if either \ x \ > W or \ y \ > W. Since F(a; © ^) = undefined when 

I (a; © 2 /) I > W, we thus complete the proof. □ 

Lemma 2 : For any two sets X and Y whose elements are indexed video intervals, the pairwise 
glue operation has the following property. 

f(^0v') = f(f(x)0f(f)), 

where F is either a time- window (F„) or a maximal noise- width (Fn,k) filter. 

Proof: Let U denotes F(X ^Y) and V denotes F(F(X) ® F(y)), then in order to prove 
that U = V, all we have to show is t/ © V and U C V. 

Consider any arbitrary x G F(X). Naturally, x £ X. Therefore X © F(X). Similarly, 
Y © F{Y). It follows that X 0 y © F(X)0F(y). Also, F(X0y) © F(F(X) 0 F(y)). 
Hence, UDV. 

Next, consider an arbitrary element z £ U. Then \ z \ <W and there must be a; G X and 
y £Y such that x^y = z. Since \ z \ <W implies \ x \ < W, we can say that x £ F(X). 
Similarly, y £ F(F). Also, since | (a; © ^) | <W,(x^y)£ F(F(X) 0F(y)). Therefore, 
2 : G F(F(X) 0F(y)). Hence, U CV. The proof is now complete. □ 

Theorem 2 : For any interval sets X and Y, the following expression holds: 

F(X(g)K)= F(F(X0X)0F(r0r)) 

where F is either a time-window (F„) or a maximal noise-width (Fn,k) filter. 

Proof: By the definition of powerset glue, X = (X 0 X) 0(X 0 X). By applying 
the filter Fin both sides, F(X0X) = F((X 0X) 0(X 0 X)). Supposing, X 0 X = X' 
and X0X = X'. According to the Lemma 2, F(X'0X') = F(F(X') 0F(X')). By 
substitution, F((X 0 X) 0(X 0 X)) = F(F(X 0 X) 0 F(X 0 X)), 

Hence, F(X 0X) = F(F(X 0X) 0F(X 0X)). This proves the theorem. □ 
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Abstract Multimedia searching over Internet has gained substantial popularity 
in the past two years, pushing the emergence of multimedia portals. 
The key issues are how to do access to multimedia data and so how to 
represent multimedia data to ease and to speed up such accesses. 
Access ways are evolving from keywords and related metadata to 
more intelligent representations (shadow semantics such information 
relationships inside the video itself). Furthermore, technologies such as 
Java, along with the growing number of Web browsers that can 
execute Java applets facilitate distributed multimedia accesses. 
Networking and computational performances are key concerns when 
considering the use of Java to develop performance-sensitive 
distributed multimedia search engines. This paper makes three 
contributions to the study of such performance-sensitive distributed 
multimedia search engines. First, we describe the innovative 
architecture of MEVISE, the MediaSys Video Search Engine over a 
large scale network. Second, we introduce the content-based 
representation of the video data inside the MediaSys server. Third, we 
present the search capabilities of MEVISE. 

Keywords: Video Search Engine, Information Retrieval, Multimedia Database 



1. INTRODUCTION 

Lot of works in the field of multimedia management systems have 
followed generic approaches. Furthermore, Visualisation tools have been 
developed such as video search engines by content (Ram 1999), and 
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especially for the manipulation and the storage of video features (Flickner et 
al. 1995) such as the colours, the edges, or the shape. 

Also, the design of data structure has been studied for modelling (Frakes 
& Baeza- Yates 1992, Ghandeharizadeh 1996) representing and storing 
multimedia video content [Ozden 1996] as well as for accessing efficiently 
to video data [Guttman 1984, Sellis et al. 1987, Beckmann et al. 1990, Lin et 
al. 1994, Ahanger & Little 1996, Bertino 1997). 

However, problems of large scale distributed multimedia information 
systems such as the quality of the search, the performance, the flexibility, 
and the customisability have been mostly ignored. MediaSys provides such 
an extensible infrastructure for multimedia management. For instance, plug- 
ins for video search operations can be dynamically associated and integrated. 
This eases its upgrades according to the user's requirements and according to 
the evolving technology. 

The remainder of this paper is organised as follows: Section 2 motivates 
and describes the architecture of our multimedia distributed system. Then, 
Section 3 describes more particularly the ME VISE component, i. e., the 
client part of the architecture. Next, Section 4 introduces MediaSys, the 
server and the content-based representation for video storage management. 
Finally, we give some concluding remarks. 



2 . ARCHITECTURE OF THE DISTRIBUTED 
MULTIMEDIA SYSTEM 

2.1 MOTIVATIONS AND GENERAL ARCHITECTURE 

Video retrieval plays a key role in cultural, medical, or environmental 
applications. The demand for systems that support video searching, 
visualisation, analysis and processing has increased significantly, as reported 
by the Gardner Group. This increase is due to the advent of the digital world 
and the generalisation of direct production of video. Also, due to Internet, 
the distribution aspect through portal is more and more important. 

Figure 1 shows the general architecture of a large-scale distributed 
multimedia system. In this type of environment, various devices produce 
video data that are transferred to the multimedia storage systems, i. e., 
MediaSys servers. Users with various roles and profiles can search for and 
accesses to such video data using MEVISE clients. They can visualise them 
and/or analyse them according to some semantics. 

The need for video search engines over distributed environment is also 
driven by economic factors. Actors and holders of video data such as 
museums, news broadcasting companies require search engines to unify the 
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search over their multiple and distinct video repositories. Furthermore, they 
are required to provide a quality of service (QoS) in terms of high 
performance and functionality. High-speed networks, such as ATM or Fast 
Ethernet, allow the transfer of video efficiently, reliably, and economically. 

Motivations and General Architecture Motivations and General 
Architecture 




Figure 1 Architecture of the Multimedia Distributed System 

2.2 REQUIREMENTS OF A VIDEO SEARCH ENGINE 
OVER A LARGE SCALE DISTRIBUTED 
ENVIRONMENT 

A typical Video search engine over a distributed environment (Mungee 
1998, Coulson 1999) must be: 

- Extensible to manipulate new video format and to adapt to various data 
models; 

- Efficient to process video data in order to get their semantic and to 
deliver it to the users; 

- Scalable to support the growing demands of video searching over large- 
scale systems; 

- Flexible to dynamically reconfigure video processing features to cope 
with changing requirements; 

- Reliable to ensure QoS in terms of correctness and availability of video 
data when requested by users; 
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- Cost-effective to minimize the overhead of accessing video across large 
scale distributed systems; 

- Secure to ensure that data confidentiality is preserved according to 
user's role. 

Developing a distributed video search engine that meets all of these 
requirements is challenging, particularly because some features conflict with 
others. For instance, efficiency often requires high-performance computers 
and high-speed networks, thereby raising costs as the number of users 
increases. 

2.3 ADOPTING WEB AND JAVA TECHNOLOGIES 

All The strong interest in the Java language has followed the ubiquity of 
inexpensive Web browsers. These have brought the Web technology to the 
desktop of any computer. Over the past two years, the Web technology 
enables the Java programming language to spark considerable interest 
among software developers. Its popularity stems from its flexibility, 
portability, and relative simplicity compared to other object-oriented 
programming languages (Jain & Schmidt 1997). 

The applet feature supported by Java is particularly relevant to 
distributed multimedia system. An applet is a Java class that can be 
downloaded from a Web server and run in a context application such as a 
Web browser or an applet viewer. The ability to download Java classes 
across a network can simplify the development and configuration of efficient 
and reliable distributed multimedia search engines (Schmidt 1997). 

The MEVISE prototype has been developed as a Java applet. Therefore, 
it can run on any Java-enabled browser that supports the standard Java 
windowing toolkit. MEVISE leverages the convenience of Java to 
manipulate video and provides video-processing capabilities to users 
connected via the Web. In our experience, developing a video search engine 
over distributed environment in Java is relatively cost effective.Performance 
is a key requirement in large-scale distributed multimedia systems, 
regardless of any particular application domain. Meeting the performance 
demands requires the following support from MEVISE. First, its video 
searching and processing capabilities must be precise and efficient. Secondly, 
its networking mechanisms must download, and upload large video rapidly. 
Assuming that efficient video processing algorithms are used to extract 
semantic characteristics, the performance of a MEVISE applet depends 
largely on the efficiency of the hardware and of the JVM {Java Virtual 
Machine) implementation on which the applet is running. 
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2.4 DESIGN OF THE MEVISE PLATFORM 

Figure 2 shows the detailed architecture of the MEVISE platform. The 
two major components of the initial version are the MEVISE client and the 
MediaSys server. MEVISE is an extension of the Name-It prototype (Satoh 
1997). The MediaSys server is based on the AH YDS platform (Active 
HYpermedia Delivery System) (Andres & Ono 1998), which is an active 
hypermedia delivery system developed at NACSIS since 1995. Furthermore, 
the MEVISE tool uses the resource manager JACE [URLl] developed at 
Washington University, Saint Louis, Missouri. Einally, it also follows the 
Active Object approach (Lavender & Schmidt 1996). 

The MEVISE Client and The MediaSys Server are discussed in the 
following two sections. 



MEVISE Client 



Graphical User Interface 



Server Locator 




Video processing 


Video Downloader 
uploader 




Plug-ins 

configurator 



Multimedia data-flow management 



JACE 



MediaSys Server 



Active Hypermedia 
Delivery System 



Name-it plug-ins 
http plug-ins 



Multimedia data-flow management 



Communication protocols 
e.g„ HTTP, DICOM 



Figure 2 The MEVISE platform 



3. THE CLIENT MEVISE APPLET 

In this section, we overview the major features of MEVISE and its 
Search Component. Furthermore, we sketch the use of fuzzy technology for 
advanced video retrieval by content. 
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3.1 THE KEY FEATURES OF MEVISE 

MEVISE allows users to search for names or faces inside video shot 
information system and also allows to access to any video stored in a 
MediaSys server. In addition, the MEVISE applet provides a hierarchical 
browser that allows users to traverse sets of video on remote MediaSys 
servers. This makes it straightforward to find and select video across the 
network, making MEVISE quite usable, as well as easy to learn. 

As shown in Figure 2, there are three main components in a MEVISE 
client applet: 

- Graphical User Interface: which provides a front-end to the video 
search engine. It enables the users to search video according to some criteria, 
and to receive them. Figure 3 illustrates the MEVISE Graphical User 
Interface (GUI.) 

- Server Locator: which locates an URL address associated to the user 
role that can reference a video-server or an image server (MediaSys-typed 
server). If the URL points to a MediaSys server, its content is listed so any 
user can browse them to choose one video to retrieve. The following video 
Downloader module uses this component. 

- Video Downloader: which downloads a video or a set of video frames 
located by the Server Locator and displays the video or the set of video 
frames in the applet. The Video Downloader also ensures that all frames are 
retrieved and displayed properly. 

3.2 The MEVISE SEARCH COMPONENT 

The input to the Search Component consists of a set of sample captured 
images. First, the Search Component enables the user to retrieve similar 
faces from a set of video frames according to a given name: it is the Face 
Similarity Retrieval Function. The result is an ordered list of captured 
images of decreasing similarity with respect to a ranking weight. Secondly, it 
also enables to retrieve associated names according to a selected frame: it is 
the Associated Name Retrieval Function. The result is an ordered list of 
names of decreasing ranking of relevance. Third, the Search Component can 
also retrieve video shots related to a selected face in order to play it. 

Search component for Names according to a given Face 

The Search Process is based on three steps: 

(1) Find a similar face given a input face 

(2) Find coincident Names according to the score 
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(3) Evaluate the co-occurrence taking face similarities & name scores into 
account. 




face frames ED score names 



Search Component for Faces according to a given Name 

The Search Process is based on 4 steps: 

(1) Input a name 

(2) Navigate in the links between Names and score and Locate the 
occurrence of the input name in the set of frames 

(3) Find coincident faces 

(4) Calculate co-occurrence using face similarities and name scores based on 
the SR-tree indexes. 



4. THE MEDIASYS SERVER 



4.1 GLOBAL DESCRIPTION 

At the other end of the network, the MediaSys server is a high- 
performance, multi-threaded, HTTP application-oriented information engine 
based on the AHYDS {Active Hypermedia Delivery System) platform. A 
detailed description has been done previously (Andres & Ono 1998). It 
stores the MEVISE applet, video data and captured images. Figure 3 
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illustrates the interaction between the MediaSys server and the MEVISE 
applet concerning video/captured image transfers. The MEVISE applet uses 
the HTTP GET method to interact with the MediaSys server to request video, 
names or captured images (1) and (2) in Figure 4. The MEVISE applet also 
accesses to the MediaSys server to download specific video or image 
retrieval operations. Each image or video filter is a Java class that can be 
downloaded by MEVISE. This design allows MEVISE applets to be 
dynamically configured with video retrieval plug-ins. In addition, the 
MediaSys server supports captured image uploading using the HTTP PUT 
method. This allows the MISE applet to save processed images persistently 
at the server. 




Figure 3 Interaction between the MEVISE Client and the MediaSys Server. 

Java applets provide an exception to these security restrictions. In 
particular, the Java applet class provides a method that allows an applet to 
download images from any server reachable via an URL. Since the method 










Toward The MEdiaSys Video Search Engine (MEVISE) 39 

is defined in the Java applet class, it allows Java to ensure that there is no 
security violation. MediaSys uses this method to download video and 
captured images across the network. Therefore, video and captured images 
to be processed can reside in the MediaSys server from where the MEVISE 
client applet was downloaded. They also can reside on some other MediaSys 
servers in the network. However, videos/captured images data flow can only 
be used to upload images to the server from where the MEVISE applet was 
downloaded. 



2: pull (identification of multimedia objects) 




Figure 4 Video/Captured Image Data Flow Management 

4.2 Video Storage 

4.2.1 Video Abstraction Processing based on Name-It. 

Video abstraction done by Name-It (Satoh 1997) is based on the 
extraction of content information by combining image understanding 
processing and natural language processing. Name-It automatically extracts 
face and name associations as content information from video news or video 
movies as input source. To accomplish this task, the system takes a multi- 
modal video analysis approach: (1) face sequence extraction/identification 
from videos, (2) name extraction from transcripts, and (3) video caption 
recognition. Each method includes several advanced image and natural 
language processing techniques: face tracking, face identification, intelligent 
name extraction using dictionary, thesaurus, and parser, text region detection. 
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image enhancement, character recognition, and the integration of these 
techniques. 

4.2.2 Content-based representation Storage 



The content-based representation includes three sets of data: the face 
information (FaceMinPosition, FaceMaxPosition, and Frame), the Face 
name information (Name) and the relationship with Video Shots. All those 
informations are stored inside the Extended Binary Graph (EBG) data 
structure (Andres and Ono 97) to provide high performance access for the 
retrieval process. The Major benefits of the graph based approach are: (1) a 
pointer-based structure enabling both navigation and selection; (2) Compact 
Relationships between Ids and Values (e.g. Name, Video shots); and (3) 
Access from IDs to values and reverse access from values to ID. 

Figure 5 summarises the content-based representation of the video semantic 
used by MEVISE. FaceMinPosition and FaceMaxPosition are used to locate 



Index 

Management 



F aceMinPosition 




Figure 5 EBG support for Video Abstraction 



faces inside a specific frame. The Name-Frameld relationship and 
videos hots-Frameld relationship enable to traverse the data structure graph 
according to the user's interests. Each information (Name, Frame,..) can be 
clustered according the application's requirement. Furthermore, the resource 
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supervisor of the MediaSys platform can process each attribute of the video 
abstraction accordingly to the characteristics of the application's access. 

The MediaSys server also provides a nearest neighbor search capabilities. 
MediaSys server supports a main memory oriented SR-Tree structure 
(Katayama 1997) stored as EBG in order to index the large number of face 
positions inside frames where disk accesses dominate. In this case, the 
MediaSys server uses a clustering operation where the SR-Tree is reordered 
such as each memory page contains one subtree. Figure 6a and 6b overviews 
the SRTree structure and the clustering approach at the node level. Each face 
location is mapped into a multi-dimensional space where the distance 
between two faces corresponds to the similarity between the two faces. 



EBG 




Figure 6a EBG support for Video Abstraction 
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Figure 6b EBG support for Video Abstraction 



Each sub-region in memory is determined by the intersection between the 
bounding sphere and the bounding rectangle as it is deseribed at the node 
level in Figure 6b 



5. CONCLUSION 

This paper described the design, and the architecture of the MEdiaSys 
Video Search Engine (MEVISE), an advanced video search system over 
Internet and/or high-speed network environments. This system enables the 
users to search, to browse, and to retrieve video and captured images 
according to the combination of visual and name features with meta-data 
related to the videos. The MediaSys servers store the meta-data, visual and 
textual features, and both video and captured images over a large-scale 
distributed and heterogeneous system. 

Java has still limitations that required to be fixed to improve future 
versions of MEVISE: 

1- Memory limitation (conflict between large images and browser). 

2- An improvement in terms of portability of Java Swing in the navigators. 

3- A modification of the security inside Java to configure dynamieally the 
accessible servers. 

As conclusion, MEVISE, the client video search engine of the MediaSys 
project, gains its efficiency by the use of Java. The Java approach enables to 
build valuable tool, as MEVISE is a simple, portable, and distributed system. 

In the on-going research work, the MediaSys Video Search Engine will 
be extended with flexible fuzzy search algorithms with new video features 
such as key word, and multi-resolution. Performance evaluation will assess 
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the quality of our approach and the architecture of MEVISE search 
algorithm. 
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Abstract This paper proposes a method for content-based sports video retrieval using 
camera work information. Since particular camera work for a typical scene 
exists in sports videos, camera work transition becomes an effective cue for 
retrieving a sports scene based on its content. The proposed method extracts a 
series of camera parameters from both a user-specified scene of a retrieval key 
and a video stream of a retrieval target, and detects scenes having content 
similar to that of the key from the target by applying continuous DP matching. 
The method was evaluated using a video stream of a baseball game. Recall- 
Precision curves make its effectiveness clear. 



Key words: Video, MPEG, DP matching. Content-based retrieval 



1. INTRODUCTION 

Advanced video compression technology such as the MPEG video- 
encoding scheme makes it possible to store large digital video archives in 
computer systems. As the quantity of the video archives increases, it 
becomes impractical for video users to search for their favorite scenes from 
the archives by watching them one by one. A video database system is 
expected to solve this problem and a great deal of research has recently been 
done on content-based video retrieval techniques. To realize content-based 
video retrieval, semantic information representing a scene content should be 
assigned to each scene as an index. It is, however, impractical to assign all 
the semantic information by handwork since each video stream generally 
includes an enormous amount of scenes and semantics. Therefore, video 
retrieval based on the similarity of video features is a reasonable approach 
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since such features as a color histogram, camera work information and 
object motions can automatically be extracted from video streams (Miyamori 
et ah, 1998)(VisualSEEk)(Chang et ah, 1997)(Smith et ah, 1998)(Mohan, 
1998). 

This paper proposes a method of content-based sports scene retrieval 
using camera work information as a video feature. In broadcasting sports 
programs, semantically the same scenes tend to be captured by the same 
camera work. For example, home run scenes in baseball programs have a 
typical transition of camera work as follows: follow the ball struck by the 
batter, zoom in on the stands where the ball falls in and follow the batter 
running around the diamond. In soccer programs, the following typical 
camera work exists in comer kick scenes: follow the ball kicked from the 
comer to the front of the goal post and zoom in on the front of the goal post. 
Therefore, camera work transition is expected to be an effective cue for 
retrieving similar highlight scenes from sports video programs. 

The proposed method allows users to input a sample scene having some 
semantics as a retrieval key. For example, when a user wants to retrieve 
home mn scenes from a baseball game program, he or she inputs a typical 
sample of a home run scene. Camera work information is then extracted 
using motion vectors in an MPEG encoded video stream from both the user- 
specified sample and the target program. Extracted information consists of a 
series of camera parameters. Similar scenes are detected using continuous 
DP matching between the obtained series of camera parameters. 

The rest of the paper is organized as follows: Section 2 discusses related 
research work that has been reported. Section 3 describes the proposed 
method for content-based sports scene retrieval using camera work 
information. Section 4 presents experimental results obtained in applying 
scene retrieval to a baseball game program. Finally, concluding remarks 
appear in Section 5. 



2. RELATED RESEARCH WORK 

Several content-based scene retrieval methods using video features have 
been proposed so far. One of the methods defines a relationship rule between 
semantics of a scene and its structure of events (Miyamori et al., 1998). In 
this method, each event of a soccer game is defined using a short-time action 
description. The description is composed of a player’s ID, position, and 
actions like (WHO, WHERE, WHAT). Relationship rules can be described 
as "heading shoot = (field player, in front of goal, jump) + (keeper, in front 
of goal, dive)" and "comer kick = (field player, comer, kick)". When a user 
inputs a semantic keyword about a scene he or she is interested in, it is 
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interpreted into a set of events based on a rule dictionary. Since each 
description is also related to a scene’s physical position on a video stream, 
scenes having user-specified semantics can be obtained from the interpreted 
events. To automatically relate short-time action descriptions with their 
physical scene positions, advanced image processing and recognition 
techniques are required to extract and analyze detailed features of a video 
stream. Considering the actual current level of these techniques, this method 
is impractical since it forces users to define almost all the relationships by 
handwork. 

Another existing method allows users to directly input a retrieval key 
with features. It extracts features such as the color, shape, and moving 
direction of objects in a video stream and constructs an index from them. 
Users request scenes they are interested in by specifying object information 
like ”a red object moving to the left". This method is a reasonable approach 
since it is based on video features that can automatically be extracted using 
current image processing techniques. It, however, has a drawback in that it is 
difficult to represent complicated movement such as that which occurs in 
home run scene in a baseball game as a retrieval key. 

Another method has been proposed to eliminate the above-mentioned 
drawback, in which a sample scene having complicated features itself can be 
used as a retrieval key (Smith et al., 1998). It extracts color information from 
both a user-specified scene and video streams in a database, and constructs 
feature histograms from each bit of color information. It then detects the 
scenes similar to the user-specified one by matching those histograms. Since 
feature histograms do not include information about the appearance order of 
features in a video stream, however, this method cannot detect scenes based 
on the similarity of motion transition. Furthermore, it cannot consider 
differences in scene length between a user-specified one and similar ones in 
a video database. As a result, for example, a slow motion replay of a user- 
specified home run scene cannot be detected as a similar scene. 

This paper proposes an effective method for content-based sports scene 
retrieval by solving the above-mentioned problems in the existing methods. 
It takes both motion transition similarity and differences in scene length into 
consideration in detecting scenes similar to a user-specified one. 
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3. THE PROPOSED METHOD 

3.1 Outline of the Process 

Figure 1 shows the outline of the proposed method. The individual steps 
can be summarized as follows: 

1 . Specification of a Retrieval Key 

A user specifies a sample scene he or she wants to retrieve as a retrieval 
key. For example, when a user wants to retrieve home run scenes from a 
baseball game program, he or she inputs a typical sample of a home run 
scene. 

2. Feature Extraction 

Camera parameters are extracted based on the optical flow from both the 
user-specified scene and a target video. Camera parameters give global and 
local motion. Global motion represents the movement of a background 
caused by the movement of a camera. Local motion represents the 
movement of objects captured in a video. In this paper, we use motion 
vectors including an MPEG encoded video stream as optical flow to obtain 
camera parameters. 

3. Similar Scene Detection 

Scenes similar to the retrieval key are detected by feature matching 
between the extracted two series of camera parameters. Though several 
methods have been proposed to perform matching between two series of 
features, we adopt the continuous DP matching method since it effectively 
detects similar scenes by taking both motion transition similarity and 
differences in scene length into consideration. The continuous DP matching 
detects many overlapped scenes. Therefore, our method removes closely 
overlapped scenes from the matching results and gives only meaningful 
scenes. 

3.2 Feature Extraction 

To estimate camera parameters (i.e. information on the degree of pan and 
zoom operations) based on optical flow of video images, we adopt the 
technique mentioned in (Meng et al., 1996) that uses motion vectors in an 
MPEG encoded video stream as optical flow. Given the motion vectors (u, 
v) for each macroblock in a P-picture, the following equation is generally 
satisfied if the vector represents the movement of a background (i.e. global 
motion) through the operations of a fixed position camera: 
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Figure 1. Outline of proposed method. 
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where (x, y) are the coordinates of the macroblock, Gx is the degree of a 
horizontal pan operation, Gy is the degree of a vertical pan operation, and Gz 
is the degree of a zoom operation. For each P-picture, (Gx, Gy, Gz) is 
estimated using the Least Square estimation to minimize the error between 
the motion vectors estimated in (1) and the actual motion vectors obtained 
from an MPEG video stream. Not all motion vectors in an MPEG video 
stream, however, represent global motion. Consequently, if the difference 
between estimated and actual motion vectors is greater than a certain 
threshold, the actual one is excluded from the Least Square estimation. The 
estimation is repeated until no exclusive vector is found. 

Motion vectors excluded in the above estimation are considered as the 
vectors representing the movement of objects (i.e. local motion). The 
average value of these vectors (Lx, Ly) are also treated as video features and 
used for feature matching. Therefore, the features extracted from the i-th P- 
picture in a MPEG video stream are represented as follows: 

/(/) = (G„G^,G,,L„L^) (2) 

The series of features Fs and Ft obtained from a retrieval key S and a 
target video stream T are represented respectively as follows: 






( 3 ) 
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F,=fr{l)<fr{2)<--<f,{N,) (4) 

where Ns and Nt are the number of P-pictures in S and T, respectively. 

3.3 Feature Matching 

Our content-based sports scene retrieval is based on the generality that 
semantically the same scenes tend to be captured by the same camera work 
in a broadcasting sports program. Therefore, the appearance order of camera 
parameters is very important to detect similar sports scenes. Furthermore, a 
slow motion replay scene should always be detected as a similar one when 
its original one is given as a retrieval key since these semantics are obviously 
the same. Therefore, a feature matching method must be able to consider the 
differences in length between original and replay scenes. Considering these 
requirements, we adopt continuous DP matching as a feature matching 
method. 

Continuous DP matching compares Fs with a partial sectfon of Ft while 
expanding or contracting Fs, and determines the correspondence of fs(i) and 
fxCj) as the pattern distance D(k) representing the similarity of features is 
minimized. When fs(Ns), i.e. the end of Fs, is corresponded to a fx(k) (l^k 
^Nt), D(k) is obtained as the following recursive formula: 

D{k) = i g{k,Ns) (5) 

k-k'+Ns 



g{TX) = oo{2<T<Ns) ( 6 ) 



g(\,T) = 2d{\,T){\<r<N^) 



(7) 






^ g{i-l,j-V) + 2d{i,j)^ 
g{i-\,j-2) + 3d{i,j) 
g O’ -2, 7-1) + 3 JO',;) 



( 8 ) 



where k’ is the value of j when i is 1, i.e. fs(l) is corresponded to fx(k’). 
d(i, j) is the distance between fs(i) and fxG)- In this paper, it is defined as 
follows: 
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<i,., =(G,(>)-G,(y))*+(G,(0-G,O'))" (9) 

( 10 ) 

(II) 

d(i.j) = ^0),d^, (12) 



The degree of influence on global and local motions can be adjusted by 
the values of co i, O) 2 and co 3. Since pan and zoom operations occur 
simultaneously, the degree of influence on dpan and dzoom can be varied 
independently by the value of co 1 and 0)2- 

D(k) represents the similarity between a user-specified scene S and a 
partial section of T bounded by k’ and k. The higher the similarity is, the 
smaller the value of D(k) becomes. Since obtained partial sections 
corresponding to all D(k)’s for every k (l^k^Nj) considerably overlap 
each other in general, redundant sections having large D(k) are removed 
from the matching results. 



4, EXPERIMENTS AND EVALUATIONS 



4.1 Prototype Overview 

We implemented the proposed method on a prototype system to 
determine the method’s validity. Figure 2 shows the system overview. The 
main components are the pre-processing module, the GUI module, the 
similar scene detection module, the sample scene selection module and the 
parameter setting module. 

The GUI module consists of the query composer and the result viewer. It 
allows a user to select a target video stream, specify a scene using a retrieval 
key and set the parameters of the feature matching. It is implemented using 
CGI and runs on a Web browser. 
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Figure 2. Prototype system overview. 

The parameter setting module sets the parameters of co i, O) 2 , to 3 
mentioned above and the extracting interval of P~pictures explained below. 
These are referred to in the similar scene detection module. 

The sample scene selection module allows a user to specify a scene using 
a retrieval key. A user can select a key from a sample scene database or 
make it from a target video stream by using the scene cutoff tool in this 
module. A user can also use the retrieved results as retrieval keys. 

The similar scene detection module consists of the feature matching and 
the redundancy eliminating components. The feature matching component 
performs the continuous DP matching between two series of features. The 
redundancy eliminating component removes closely overlapped sections and 
sections having large D(k) from the matching results. 

The pre-processing module extracts features from a target video stream 
in advance since it takes a long time to extract features from a large video 
stream. For example, it takes about 20 minutes for a video stream of an hour 
in length under the configuration described below. On the other hand, feature 
extraction for a retrieval key is performed on demand since a retrieval key is 
generally short and the extraction takes very little time. 

Figure 3 shows a screen shot of the prototype system. The system 
configuration is as follows: 

- PC: Pentiumll 400MHz, 256MB memory 

- OS: WindowsNT4.0 

- Browser: Internet Explorer 

At the upper left area in Figure 3, a user selects a scene using a retrieval 
key and previews it. At the upper right area, the user selects a target video 
stream. Using the buttons in the upper center area, the user sets the 
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parameters of the feature matching and starts scene retrieval. Retrieval 
results are shown in the lower area. 




Figure 3. Screen shot of the prototype system 



4.2 Application to a Baseball Game Program 

The efficiency of the proposed method was verified by applying it to a 
broadcast baseball game program. The program used a target video stream of 
about 1 hour and 40 minutes in length or about 1 GB in the MPEG-1 
compression having 171,152 frames. It includes 46,746 P-pictures (i.e. Nt = 
46,746). Two samples of home run and infield grounder scenes were 
selected as retrieval keys from the target. Table 1 shows the details of the 
retrieval keys. Slow-motion replays of original scenes are also counted as the 
number of the identical semantic scenes. 



Table L Details of the retrieval keys 





Home run 


Infield grounder 


Picture size 


352x240 


352x240 


Scene length 


18 sec 


7sec 


Number of P-pictures 


148 


63 


Number of identical semantic 
scenes 


13 


11 



Experiments were performed for the following three cases: (a) col: 0)2: 
0)3 = 1:1:1, (b)0) 1:0)2:0)3 = 1:1:0, and (c)co 1:0)2: 0)3 = 0:0:1. In (a), 
global and local motions exert equivalent influence in the calculating of the 
similarity of scenes. On the other hand, (b) considers only global motion and 
(c) only local motion. 
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Figures 4 and 5 show the Recall-Precision curves obtained from the 
results of scene retrievals using the above-mentioned keys. In the home run 
scene retrieval, the (a) and (b) give better curves than (c). Furthermore, the 
curve for (a) is almost the same as that for (b). This means that global 
motion is the main factor in the retrieval of home run scenes. This shows the 
validity of our idea that semantically the same scenes tend to be captured by 
the same camera work in a broadcast sports program. 

The precision rate in Figure 4 becomes worse when the recall rate is 
greater than 0.8. This was due to the fact that three of the 13 home run 
scenes in the target program were captured with irregular camera work, e.g. 
the camera followed the flight of the ball from the outfield stands. This 
shows that a variety of samples for the same semantic scene should be 
prepared to enhance the precision and recall rates in our method. 

The retrieval result of home run scenes clarified the efficiency of 
continuous DP matching since the slow-motion replay scenes of the given 
retrieval key were unfailingly detected as very similar scenes. Many scenes 
of outfield fly balls were also included; this is because the camera motion for 
capturing outfield fly balls is similar to that for capturing home runs. 

In field grounder scene retrieval, on the other hand, (b) gives a worse 
Precision-Recall curve than (a). The degree of camera work in an infield 
grounder scene is smaller than that in a home run scene. Furthermore, an 
infield grounder scene generally includes typical object motion such as an 
infielder throwing the ball to first base. Therefore, local motion is also useful 
for the content-based scene retrieval in the proposed method. 




Figure 4. Precision-Recall curve of home run scene retrieval 
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Figure 5. Precision-Recall curve of infield grounder scene retrieval 



4.3 Enhancement of Response Time 

In the previous subsection, all features extracted from P-pictures in both 
retrieval key and target video stream are applied to the feature matching. 
Since processing time of the feature matching increases in proportion to the 
number of features, the number of P-pictures extracted should be as small as 
possible to enhance the response time of scene retrieval. The smaller the 
number of extracting P-pictures is, however, the worse the recall and 
precision rates become. The experiments described in this subsection clarify 
that how the number of extracted P-pictures affects the recall and precision 
rates of similar scene retrieval in our method. 

The Recall-Precision curves were obtained while the P-picture extraction 
interval was varied from 1 to 5. An extracting interval of 5 means, for 
example, that features are extracted from every fifth P-picture in an MPEG 
video stream. The same infield grounder scene and the same baseball game 
program used for the experiments described in the previous subsection were 
used for a retrieval key and a target video stream, respectively. The results 
obtained are shown in Figure 6. Almost the same curves were obtained when 
the extracting intervals were set to 1, 2, 3 and 4. For an interval of 5, 
however, the curve became worse than for the other cases. This is likely 
caused by the structure of an MPEG encoded video stream, i.e. one that 
consists of multiple GOP’s (Group Of Pictures). A GOP generally includes 
four P-pictures as shown in Figure 7. Therefore, at least one P-picture in 
each GOP is used for feature extraction when the extracting interval is set to 
between 1 to 4. In these cases, the lack of features in global and local motion 
is considered to have little significance since each P-picture is derived from 
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the same I-picture in the GOP through the motion compensation processing. 
On the other hand, it extracts a comparatively strong effect for an interval of 
5 since many OOP’s are skipped in the feature extraction. Though our 
method can be applied in the same manner when features of B-pictures are 
also extracted and used for the feature matching in addition to those of P- 
pictures, these results show that such additional features would not enhance 
the recall and precision rates to any significant degree. 




Figure 6. Effectiveness of P-picture extraction interval 




Processing time of the feature matching varied as shown in Figure 8. 
These results are understandable since the processing cost of the feature 
matching is 0(Ns*Nt), where Ns and Nt are the number of P-pictures 
extracted from a retrieval key and a target video stream, respectively. The 
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response time can be considerably shortened when the P-picture extraction 
interval is set to between 2 and 4, while maintaining the recall and precision 
rates at almost the same as those when all P-pictures are used in the feature 
matching. 
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Figure 8. Processing time of the feature matching 



4.4 Effectiveness of Local Motion 

Another experiment was performed to clarify the effectiveness of using 
local motion in the proposed method. For the same baseball program target, 
a video clip in which a person turns his hand around twice in front of a fixed 
camera is given as a retrieval key. 

For a)l:co2:co3 = l:l:l,the detected scenes with high similarity mostly 
contain images of a batter making two practice swings with his bat. On the 
other hand, for col:co2:co3 = 1:1:0, scenes containing object motion similar 
to that in the retrieval key cannot be detected. These results show the 
proposed method has the ability to retrieve scenes on the basis of object 
movement similarity as well. 




5. CONCLUSIONS 

This paper proposed a method of content-based sports scene retrieval 
using camera work information as a video feature. The method extracts a 
series of camera parameters from both a user-specified scene of a retrieval 
key and a video stream of a retrieval target using motion vectors in MPEG 
encoded video streams. Continuous DP matching is adopted for the feature 
matching between a retrieval key and a target stream since it takes both 




58 VISUAL DATABASE SYSTEMS 



motion transition similarity and differences in scene length into 
consideration. 

The method was evaluated using a broadcast baseball game program. 
The result showed the validity of our idea that semantically the same scenes 
tend to be captured by the same camera work in broadcasting sports 
programs. It was verified that the continuous DP matching could adapt to 
differences in scene length since slow-motion replays^ a given home run 
scene were unfailingly detected as very similar s^hes. It also showed that 
local motion information was effective in such cases as the retrieval of 
infield grounder scenes. 

To enhance the response time, we studied how the number of P-pictures 
extracted affected the recall and precision rates in our method. The results 
obtained showed that almost the same recall and precision rates could be 
obtained even if only every fourth P-picture in an MPEG video stream were 
used in the feature matching, while reducing the response time to about 1/16 
of the original value. 

Even if the response time can be further shortened by decreasing the 
number of P-pictures extracted, it becomes longer in proportion to the length 
of a target video stream. Therefore, ideas on how to solve this problem 
should be studied to apply our method to large video databases. It will also 
be necessary to study the effectiveness of additional features such as color 
information and scene composition in future. 

In the near future, a home server system that can store large quantities of 
digital video data will come into common use. It makes it possible that 
people can watch any TV programs broadcasted in a whole week at any time 
they want even if they forgot to reserve video recording of their favorites. 
Video summarizing function becomes very important in such situation since 
they need to find their favorites from enormous TV programs stored in a 
home server. The proposed method will be helpful to realize the function 
since it can automatically generate user’s favorite digest by registing a 
sample set of his or her interested scenes to the server. 
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Abstract We propose a visual exploration technique, “visual exploration by ex- 
ample,” for recommendations in a social filtering system. A dynamic 
semantic map of items, rearranged dynamically for each query or filter- 
ing result, gives explanations of the result with contextual information 
and helps the user’s composition of a new query. We also have intro- 
duced visual interaction techniques to the map to examine more detailed 
relationship of filtering results: coloring by rating and highlighting by 
similarity. We demonstrate an example of the user’s exploration using 
a prototype interface for a movie database. 

Keywords: Informaion visualization, visual query, social filtering, recommendation, 
information exploration 

1. INTRODUCTION 

Social filtering is an information retrieval technique that utilizes knowl- 
edge from other users (U. Shardanand and P. Maes, 1995) (P. Resnick 
et ah, 1994). It can deal with a user’s subjective “taste” for items such 
as movies and music by computing similarity of the users based on their 
rating patterns; the filtering system provides the user with similar users 
(neighbors) and items they recommend. 

However, even if the filter successfully provides items the user will 
like, it will not automate all of the user’s tasks to obtain information. A 
simple social filter lacks the following functions to help users. 

■ Explanation of filtering results. A user often likes various 
kind of items with different tastes. Because of this diversity of a 
user’s tastes, filtering results (i.e., items and reviewers) also have 
diversity in tastes. The user needs to understand how filtering 
results relate to tastes or interests. 
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■ Dealing with specific needs. A user needs a query interface 
to access items actively. A user sometimes has more specific (and 
often temporary) needs than her or his general interests on which 
filtering results are based. In such cases, the user will need to 
explore for items that meet the specific needs. 



■ Avoiding Over-fitting. If a user relys too much on the filter 
to get information and gives ratings only for filtering results, the 
filter, learning only from this feedback, tends to overfit to a portion 
of the user’s interests, and the user might miss potentially useful 
information. As long as the user gives ratings only for the filtering 
results, the filter will continue to provide items according to this 
portion and fail to cover all the tastes. Although the filter can be 
trained by specifying missing items, the user’s needs is often too 
vague to specify. Sometimes, the user is not aware of the needs 
before encountering unexpected but interesting information, which 
is called “serendipitous information.” 

The user needs to explore the information space to find items that 
meets specific needs and items missing from the filtering results. We 
claim that the system should provide contextual information of filtering 
results to explain how some items are included in the filtering results and 
others are excluded. The system should also provide querying function 
to let the user access items actively. 

In this paper, we propose a visual exploration technique, “visual ex- 
ploration by example,” for recommendations in a social filtering system. 
A dynamic semantic map of items, rearranged for each query or filter 
result, gives explanations of the result with contextual information and 
helps the user to compose a new query. 

We have developed a movie database on the World-Wide Web and 
collected ratings and comments on movies. Our prototype interface has 
been developed to visualize data from this movie database. At the time 
this paper is written, the database has 6,740 movies, 530 users, 70,416 
ratings, and 6,346 reviews. 

The next section shows an outline of our approach. Section 3 describes 
a query model that enables the user to query items and other users by 
example. Section 4 presents the algorithm to arrange a semantic map 
dynamically. Section 5, introduces interactive visualization functions 
that helps the user to browse a map. Section 6 demonstrates an example 
of the user’s exploration. Sections 7 and 8 discuss related work and 
future work, respectively. Section 9 concludes the paper. 
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2. VISUAL EXPLORATION BY EXAMPLE 

2.1. EXPLORATION BY EXAMPLE 

To explore a information space for her or his needs, the user requires 
means by which the user understands filtering results and specifies the 
needs to refine the results. Keyword-based filtering can provide a set 
of keywords related to filtering results so that the user can explore the 
information space by selecting some of them. However, social informa- 
tion has no such concrete feature as keywords. Moreover, social filtering 
often deals with subjective taste, which is hard to describe as keywords. 

Our method, exploration by example, realizes querying and expla- 
nation of recommendation by means of examples instead of specific fea- 
tures. It utilizes similarities between items, or users, derived from rating 
patterns. Exploration is realized as repetition of the user’s querying by 
example and the system’s explanation by example (Figure 1). 

First, the user composes a query with items or other users; giving 
items as a query yields items similar to them and users who like them 
(an item query by item and a user query by item), and giving users as 
a query yields users similar to them and items they like (a user query 
by user and an item query by user). The user can select items and users 
from query results to compose a new query. 

Then, the system provides a query result that includes not only items 
relevant to the query but also less relevant items. The system explains 
items by showing their relationship to other items and users. This con- 
textual information helps the user to understand filtering results and 
find candidates for the next query. 

A query model for exploration by example is described in Section 3. 

2.2. DYNAMIC SEMANTIC MAPS 

To visualize filtering (or query) results with contextual information, 
we have applied a semantic map that automatically composes overall 
information structure of data by laying them out so that geographic 
distances approximate semantic distances (i.e., dissimilarity). 

Although semantic maps are originally meant to provide a static lay- 
out of large data, our map dynamically visualizes filtering results (i.e., 
recommendation) and query results. The map is rearranged for each 
query or filtering result. The algorithm to create a dynamic semantic 
map is described in Section 4. 

The map provides overall contextual information and explains each 
item by visualizing relationship to other items. The user can compose 
and issue a new query by selecting items on the map. 




66 VISUAL DATABASE SYSTEMS 



Query by Example 




Explanation by 
Example 



Figure 1 Exploration by Example 



A query result is visualized with less similar (relevant) items. Al- 
though less relevant items in a query result are generally regarded as 
noise, they can contribute to compose the contextual information on the 
map. 



2.3. VISUALIZING CONTEXTUAL 
INFORMATION 

The map provides overall contextual information by coloring each item 
to indicate its rating given by the user. The color pattern of the map 
shows the following information: 

■ The scattered pattern of items with high rating clarifies the user’s 
implicit information needs and gives landmarks useful to under- 
stand the layout. 

■ Recommended items can be explained with their spatial relation- 
ship to items with rating. 

■ Outlying items with high rating suggest the user’s potential infor- 
mation needs. 

Figure 2 is a simplified illustration of a map, which contains two 
major clusters of nodes with high rating, A and B, and an outlying 
node P. The clusters indicate that most of the user’s interests fall into 
two categories. If the user is interested in one of these categories more 
than the other, he or she can browse the recommended items around the 
cluster. If the user wants something different from the recommendation. 
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Figure 2 Color pattern of map 



the area around the node P can be a starting point for exploration. In 
a statistical viewpoint, P is regarded as an outlier to be excluded. In 
the user’s viewpoint, however, the fact that this node lies far from the 
clusters can be a clue to a new interesting category. 

To enable the map to explain more detailed relationship of filtering 
results, we also have introduced visual interaction techniques: coloring 
by rating of selected users and highlighting by similarity to selected items 
as described in Section 5. 

3. A QUERY MODEL 

Ratings. For each movie m in the database, a user u can give a 
rating rating{m^u) and a review review{m^u). To distinguish users’ 
rolls in social exploration, we refer to a user that has given reviews as a 
“reviewer.” A rating value is an integer that ranges from 1 (awful) to 
5 (excellent) and the users are supposed to assume the margin between 
positive and negative ratings at 2.5. If u has not given rating on m, 
rating{m^u) has the default value 2.5. 

Similarity Measure. We have applied a simple vector space model 
to compute similarities based on rating patterns. The feature vectors 
Mi of the movie i (z = 1, • • • , n) and Ui of the user j (j = 1, • • • , m) are 
represented as: 
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where = rating{i^j) — 2.5. Note that, for the movie i that has not 
been rated by the user j, rij is considered to be 0. The similarity between 
two movie vectors or two user vectors is defined as their cosine coefficient 
(i.e., inner product). 

The feature vector to represent multiple movies or users can be defined 
simply as their summation normalized by the length: 



and{Mi^ Mj) 



Mi + Mj 
\Mi + Mj 



Based on the similarity measure, movies can be retrieved by a query 
that consists of multiple movies and a certain threshold. 



Recommendation. The system can recommend items based on one 
of common algorithms of social filtering (U. Shardanand and P. Maes, 
1995). Users whose similarity to the user i is greater than a certain 
threshold t are called neighbors of i {N{i)). For each item k which is 
not rated yet by the user z, a predicted rating p{k^ i) is computed as the 
mean of neighbors’ ratings weighted by their similarities: 



p(/c, i) = 2.5 + 



^jeN{i) sim(Ui^ Uj) • rj^j 
Ejeiv(j) \sim{Ui,Uj)\ 



Focused Recommendation. Given a similarity measure between 
a movie and a user (reviewer), the system can provide more focused 
recommendations than those of social filtering results. 

The similarity between a movie and a user can be introduced by re- 
garding a user as a vector on the movie space: 

<C Ui, ..., Uyyi > {uj = 1, U}^ = 0(A: j))* 

Based on this similarity, the user can search movies by users (to obtain 
recommendation from specific users) or users by movies (to obtain fans 
of specific movies). 

The system provides a focused recommendation as follows. The user 
issues a user query with movies that interests him or her and obtains 
reviewers similar to these movies, that is, fans of these movies. By 
composing a movie query with these reviewers, the user obtains new 
movies as recommendations from them. 

For example, suppose that the user is interested in a movie he or she 
has seen recently but does not know much about such kind of movies. 
The user can search fans of the movie by issuing a user query by movie 
and get recommendation, that is, what else they like and how they 
comment on them. 
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4. DYNAMIC SEMANTIC MAPS 

4.1. FORCE-DIRECTED LAYOUT 

Since the similarity of movies is defined on the user space where the 
number of dimension is as large as the number of users, the relationship 
of these similarities can not be perfectly projected on a 2 or 3 dimensional 
display. To represent the original relationship on the display as complete 
as possible, several optimization techniques have been developed (X. Lin 
et al. 1991) (M. Chalmers and P. Chitson, 1992) (J. A. Wise et ah, 1995). 
We have taken one of these techniques, a force-directed layout approach. 

There are some studies of heuristic algorithms for drawing undirected 
graphs based on force-directed models (e.g., (T. M. J. Fruchterman and 
E. M. Reingold, 1991) ). In a typical model, a couple of vertices are 
linked with a spring whose length represents the ideal distance between 
them. The vertices are placed in some initial layout and the spring forces 
on the vertices move the system into a minimal energy state. 

We apply this spring model by defining the length of a spring between 
movies or users as: Zen(a, 6) = 1 — sim{a^ b). 

The placement of the nodes is optimized by n-body simulation. The 
simulator moves each node iteratively on the two dimensional display 
according to the force between nodes. 

The force-directed layout approach has the following features that are 
useful to create a map optimized for a query result dynamically. 

■ Various constraints and forces can be added to the model. For 
example, we can fix the place of a particular object to control the 
layout. 

■ The iterative process of optimization can be shown as animation. 

4.2. SMOOTH TRANSITION OF MAPS 

While other applications of a semantic map visualize overall informa- 
tion structure of the entire database, our map is designed to explain 
query or filter results. Instead of visualizing all the items, a portion of 
the items is selected from the database to show focused information with 
contextual information for each query result. 

To visualize each query result, the layout is rearranged for the new 
set of nodes. Note that we have taken an optimization algorithm to lay 
out the nodes; this algorithm does not always compute a unique layout 
for a given semantic space and the meaning of spatial position on the 
map is therefore variable. If each layout is optimized independently, the 
user cannot understand the new map soon. To enable the user to shift 
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the viewpoint from the old map to the new map smoothly, the new map 
should be as consistent with the old map as possible. 

The new map usually has some nodes that have already been on the 
old map. To help the user to understand the new map, the continuity 
of these nodes is kept as follows: (l)If a node has been shown already 
on the old map, the system assigns the coordinates on the old map 
to the initial coordinates of the node. (2) The system visualizes the 
optimization process of the layout as an animation. 

Figure 3 shows an example of shifting a viewpoint by querying. The 
map on the left is an old map presented before submitting a query to 
the database. The visualized movies are the result of the previous query 
“Back to the Future.” The movie “Dial M for Murder” is selected and 
highlighted as a bigger node on the map. The map on the right in Figure 
3 has been obtained by issuing the movie as a query. A dark gray node 
is an old node that has been shown on the previous map and a light 
gray node is an new node. The change of the popularity of the movies 
have pushed the selected node to the center of the map. The area that 
includes “Back to the Future” on the old map has been shrunk into 
peripheral area on the new map. 
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Figure 3 Shifting the viewpoint: (left) before the query; (right) after the query 



5. BROWSING A MAP 

5.1. VISUALIZING FUNCTIONS 

We have introduced two visualizing functions that helps the user to 
browse relationship between items: coloring by rating and highlighting 
by similarity. 
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Coloring by Rating. As described in Section 2.3, the map colors 
the items according to the user’s ratings in order to help the user to un- 
derstand the layout. The map can also display color patterns according 
to other users’ ratings. The user can get a list of reviewers as the result 
of a user query by movie. When some reviewers are selected from the 
list, the color of each node on the map turns according to their ratings. 
Given color patterns on the map, the user can understand their tastes 
of movies and browse movies from their viewpoints. 

The color is printed in gray scale in this paper (darker gray indicates 
higher rating), although we use warm color (red) for high rating and 
cold color (blue) for low rating on the actual system. 

Highlighting by Similarity. The layout of nodes gives only ap- 
proximated information on the actual semantic distances. To represent 
more detailed relationship between these nodes, we have introduced a 
dynamic filter that highlights nodes similar to specified nodes. The us- 
er can browse details of the information space by selecting nodes and 
controlling the threshold of similarity. As the threshold is lowered, the 
area of highlighted nodes spreads out gradually. Figure 4 shows an ex- 
ample of a map with highlighted nodes. The round, biggest node with 
a shadow is the selected node and the rectangles with shadows are the 
highlighted neighbor nodes. 

5.2. DYNAMIC QUERYING AND 
EXPLANATION 

Visualizing detailed relationship between nodes as scattered patterns 
of colored nodes and highlighted nodes, the system helps the user to 
select known items (i.e., items with the user’s rating) as a query and to 
understand unknown items (i.e., items without the user’s rating). 

Dynamic query by known items. By selecting a known item, 
the user can dynamically extract similar items from the map. The area 
of highlight shows which item is near in the original semantic space and 
eliminates the occlusion of the other irrelevant nodes. 

Whenever any combination of the nodes are selected, the area of high- 
light changes dynamically in order to represent the neighbor of the new 
selection. The selection of the nodes is therefore regarded as ‘‘dynamic 
query” for the visualized subset of the database. 

The highlighted nodes (i.e., dynamic query results) help the user to 
find items to select next. 
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■ When one of the highlighted node is selected, and the original 
selections are unselected, the area of highlight shifts to show the 
neighbor nodes of the new selection. By selecting one highlighted 
node after another, the user can traverse the map according to the 
similarity. 

■ When one of the highlighted node is selected and added to the 
original selection, the area of highlight shows the result of this 
more specific query composed with multiple movies. By modifying 
combination of selected items, the user can refine dynamic query 
results for a particular need. 

After exploring combinations of items, the user can issue a new query 
with the combination to get a new set of items. 

Dynamic explanation for unknown items. By selecting an 
unknown item, the user can obtain scattered pattern that explains the 
feature of the item. Since the similar items are scattered over the map 
according to their similarities to the other items, their positions indicate 
what kind of movies are similar to the selected movie. 

Figure 4 and 5 show snapshots of highlighted patterns of a map that 
visualizes movies based on user ratings in our movie database. In these 
snapshots, two different movies, “Back to the Future” and “Star Wars,” 
are selected respectively. These two sci-fi movies, rated similarly by the 
users, are laid out close to each other. The highlighted patterns, however, 
reveals the difference in their details. Although these two patterns share 
some nodes, they consist of different sets of nodes spreading differently: 
“Back to the Future” has similar movies in the northeast such as “Top 
Gun” and “Ghost,” and “Star Wars” has similar movies in the south 
such as “Blade Runner” and “2001: a Space Odyssey.” In fact, the 
former is more popular among general fans, while the latter is welcomed 
by sci-fi specialists. 

The explanation given by scattered patterns could not be obtained 
neither from an ordered list of the similar items nor from an semantic 
map that includes only the nearest neighbors. While the highlighted 
nodes provide the focused information, the other, less relevant nodes 
compose the overall arrangement of the map and provide contextual 
information. 

5.3. OTHER FEATURES 

We have implemented a prototype interface based on our method. 
This interface has the following features in addition to the above two 
visualizing functions. 




Visual Exploration for Social Recommendations 




tFUTllRE (1 



|1G no nau$li^a 113941 

|17 THE UNTOUCHABLES 113B7] 

|1B torvaiioo tolao (1383) 
mAUENdSTS) 

|20 THE SILENCE OF THE LAMBS |1391 1 

1 21 TarAuumShiroiapiMa |138G] 

122 GHOST [13900 

|s£,t THE extra TERRESTRIAL nS&2) 
}24 APOLLO 13(1995] 

25RA1N MAN (1386) 
h2SJUW^SlCRftRK(l993) 

1 27 SCHINDLER'S UST (1993) 
j 26 JAWS (1S75) 
;29MatonoTakk>uutw(13S9) 

SOALiENS (1966) 

31 ROMAN HOUDAY (1953] 
32PULPRCTtDN (1991) 

336AatDRAfT (1391) 

34TaPGUN [1906) 

09841 



Figure 4 Highlighted items similar to “Back to the Future” 
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Figure 5 Highlighted items similar to “Star Wars’ 
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Fisheye View. The map uses one of fisheye view techniques that 
integrate local detail and global context seamlessly on a limited display 
surface. We adopt a hyperbolic fisheye view (V. Hovestadt, 1995) (J. 
Lamping et ah, 1995) that transforms infinite data planes into a finite 
circle. Figure 7 demonstrates an example of fisheye views. 

Labeling. Each item is visualized as a small rectangle without any 
label. When the user points an item with the mouse cursor, the system 
displays the title of the item. 

Listing Reviewers. When the user selects i^ems on the map, the 
system dynamically lists reviewers similar to these items (i.e., fans of 
these movies). Selecting some of the reviewers colors the map according 
to their ratings. 

Listing Items. The system dynamically lists the titles of highlight- 
ed items. Although the user can get the title of an item by pointing the 
node with the mouse cursor, the list of the titles shows the contents of 
the area of highlight more concisely. 

6. EXAMPLE 

In this section, we demonstrate how the dynamic semantic map help- 
s the user to understand filtering results, find items missing from the 
results, explore for specific needs, and get a focused recommendation. 

Visualizing the user’s tastes. Figure 6 shows a map visualizing 
the user’s tastes as a color pattern. Two maps in Figure 6 are identical 
but pointed at different items by the mouse cursor. The map consists 
of the 600 most popular movies in the database colored by the user’s 
ratings. As can be seen, the map includes some clusters of black nodes 
(i.e., items with high ratings) indicating areas the user like. At the center 
of the map, there is a major cluster that includes several dozen of movies 
the user like. The left map is pointed at one of the movies in the major 
cluster and displays the title of the movie, “Back to the Future.” Far 
from the major cluster, there are several minor clusters that have only a 
few movies. On the right map, the user points a movie, “Les Amants du 
Pont-Neuf,” in one of such clusters. This color pattern shows diversity 
of the user’s tastes in movies. 

Visualizing recommendation. Figure 7 shows the same map in 
Figure 6 highlighting the 50 most probably recommendable movies. The 
recommendations are located around the major cluster of movies the user 
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Figure 6 Visualized the user’s tastes 



likes. The fisheye view can zoom in on the area of the recommendations 
as can be seen in the right map in Figure 7. 

The user can browse each recommendation with contextual informa- 
tion. The location of a recommended movie roughly indicates the rela- 
tionship to the movies in the major cluster. To examine detailed rela- 
tionship, the user can select the recommended movie to highlight similar 
movies. 

Revealing overfitted results. Although the filter shows the most 
probably recommendable items, this recommendation does not cover all 
the areas of movie the user will like. The area of highlighted movies cov- 
ers the major cluster of movies with high ratings but miss other small 
clusters. There are too few rated movies for the filter to recommend 
movies around there with confidence. Even if the user like movies the 
filter recommends, he or she might not be satisfied with the result miss- 
ing some kind of movies the user likes. Visualizing filtering results with 
contextual information as in Figure 7, the system lets the user aware of 
this situation. 

Exploration for something else. In search for possible interest- 
ing movies that have been excluded from the recommendations, the user 
points a movie, “Smoke,” in one of small clusters. The map highlights 
similar movies: some of the movies have been rated and others are un- 
known movies. Zooming in on the area around “Smoke,” the user can 
examine relationship between unknown movies and known movies. 
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Figure 7 Visualized recommendation: normal view (left), fisheye view (right) 
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Figure 8 Selecting an item in search for missing items 
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To refine highlighted movies, that is, dynamic query results, the user 
select one of the highlighted movies, “Bagdad Cafe,” and obtain a list 
of movies similar to the selected movies (Figure 9). 
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Figure 9 Refining dynamic query results: highlighted items (left), list of the titles 
(right) 



Retrieving reviewers. Whenever the user selects movies on the 
map, the system dynamically lists reviewers who like these movies. In 
Figure 10, the system lists reviewers who like both of the selected movies 
in Figure 9. By selecting some of the reviewers in the list, the user can 
have the map colored by their ratings (Figure 11). This color pattern 
represents their tastes of movies. After understanding their tastes, the 
user can get their recommendations if needed: the user issues a movie 
query by users and obtains a new map that includes their recommenda- 
tions. 



7. RELATED WORK 

Social Query Models. The Tapestry system (D. Goldberg, et al., 
1992), which coined the term “collaborative filtering,” is a mail system 
that filters mail or news articles based on annotations given by other 
users. The system supports a query language TQL that enables the 
user to find articles that meet specific needs. The user, for instance, 
can acquire articles recommended by a specified person. The system. 
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Figure 1 0 List of reviewers who like the movies of interest 
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Figure 11 Movies colored by the reviewers’ ratings 
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however, is meant to support a work group among which the user can 
specify actual users to get information. In a larger and open community 
on the Internet, the user does not always know appropriate people to ask 
for recommendation. Our system enables the user to find appropriate 
reviewers out of unknown reviewers: the user can retrieve reviewers by 
items of interest and examine their interests or tastes as color patterns. 

Visual Querying. VIBE (K. A. Olsen and R. R. Korfhage, 1994) 
and GUIDO (A. Nuchprayoon and R. R. Korfhage, 1994) are tools to 
visualize a document collection according to the relationship to multiple 
keywords (or reference points). VIBE visualizes documents according to 
a user’s “points of interest” defined by keywords and their display posi- 
tions. Each document are placed at the center of gravity of the keywords 
which are weighted by the factors derived from a term’s frequency and 
documents are thus clustered geographically according to the positions 
of selected keywords. GUIDO visualizes the two-dimensional distance 
space where documents are plotted according to distances from two ref- 
erence points. These tools can explain the query result by their relation- 
ship to the keywords and enable the user to explore the document space 
by selecting multiple keywords. 

One of the principal differences of our system from them is that any 
of the visualized items can be selected as a reference point with direct 
manipulation. The movie and user can be both of a reference point and a 
subject of retrieval. The user can compose reference points by selecting 
items out of the query results visualized on the map. The result of the 
selection (i.e., highlighted nodes) even suggests another reference point 
to select. 

This feature of our system is especially important to explore a simi- 
larity space when any explicit keyword is not available. 

8. FUTURE WORK 

Evaluation by user testing. To evaluate the effectiveness of 

our technique, we need to test the system with users performing tasks. 
Tasks will include understanding users’ tastes visualized on the map, 
getting recommendation from reviewers with some specific tastes, and 
serendipitous discovery of movies. Defining performance measures for 
such tasks is also challenging future work. 

Smart labeling of items. One of the problems of a semantic map is 
a lack of comprehensibility at first sight. Once the user becomes familiar 
with the overall arrangement of the map, it helps the user to browse 
items. However, unlike a scatterplot of 2 dimensional data with x-y 
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axes, a semantic map shows an emerging structure of multi-dimensional 
data which the user does not know in advance. Landmarks of the map 
are thus important to help the user to understand the arrangement. 
Since the current map shows only labels of the item pointed and items 
selected, the user has to move the cursor around the map to understand 
the arrangement. To give the user a comprehensive ”at-a-glance” map 
of items, we have to develop a method to select appropriate items for 
landmarks. 

Combination with content-based navigation. The current 

system uses only social similarity to navigate users. However, rating data 
is not the only information that represents features of items. In case of 
our movie database, a movie has attributes such as title, year, countries, 
genres, directors, and casts. Whereas social information enables the 
user to explore items and reviews according to subjective tastes, these 
attributes lets the user explore by specific features. Our next research 
issue includes developing navigation techniques that use both of social 
information and content information. 

9. CONCLUSION 

We proposed a visual exploration technique, “visual exploration by 
example,” for recommendations in a social filtering system. The user’s 
querying by example and the system’s explanation by example are re- 
peated on a dynamic semantic map. Rearranged dynamically for each 
query or filtering result, the map gives explanations of the result with 
contextual information and helps the user to composition a new query. 

We also introduced visual interaction techniques to the map to ex- 
plain more detailed relationship of filtering results: coloring by rating 
and highlighting by similarity. Color patterns on the map according to 
reviewers’ ratings help the user to understand their tastes of movies and 
browse movies from their viewpoints. Highlighted patterns of similar 
items visualize relationship of selected items to the others. 

We demonstrated an example of the user’s exploration using a proto- 
type interface for a movie database. 
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Abstract Many real-world data warehouse applications involve navigation of large highly 
connected hierarchies e.g. web pages, product catalogs, and document hot topics 
hierarchies. Quite often users are confused and overwhelmed with many complex 
displays. This paper discusses a new invisible link technique for linking a large highly 
connected graph in a hyperbolic space without cluttering the display. Only the primary 
links are shown to the user. All other cross-links are hidden in the property of each 
node and invisible to the user. These invisible links only appear when the user focuses 
on the node. “Invisible link” allows a user to freely focus on the hierarchy of interest. 



Keywords: Large Hierarchical Space, Invisible Links, Navigation, Hyperbolic Space. 

1. INTRODUCTION 

Today many web applications involve navigation of large highly connected 
hierarchies e.g. web pages, product catalogs, and document topic hierarchies 
[4,5,6]. For example, in data mining, there is an immediate need for users to 
visualize the content and usage of the web. A difficult problem to solve is how 
to navigate through millions of interconnected documents to access 
information on one display. 

Hyperbolic space provides an elegant solution to display large hierarchies 
on a user screen. Hyperbolic space is different from the conventional 
approaches of laying trees on a Euclidean space. In Euclidean space, the area 
of the circle to contain nodes grows linearly. In a hyperbolic space, the area of 
a circle grows exponentially with respect to its radius. As a result, the 
approach described in [2,4] can handle a graph of over 20,000 documents on 
the web by using a focus and context scheme. Hyperbolic space allows a user 
to navigate through the nodes and to see the relationship of the visible portion 
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of the space to the entire structure on one single display. It is not like 
MindMan [3], which requires multiple displays to represent large amounts of 
data. The user has to click through display after display to find the information 
he/she needs. 

Based on our practical usage and design experience, we have discovered 
that hierarchical parent-child tree structures are too restrictive. Often, there are 
relationships that need to be linked to different branches of a tree [1]. They 
may even form cycles. For example, in a customer support web application, 
hyperbolic trees are used to organize large numbers of questions and answers 
in a hierarchical structure. Questions are parent nodes; Answers are arcs 
(links) to child nodes. It is sometimes necessary for an answer to link to a set 
of question/answer in another group that does not directly belong to the 
hierarchical hyperbolic tree. As a result, the generalization to a hyperbolic 
space is necessary. However, there will be multiple links with additional lines 
among all connected nodes in a hyperbolic space. Thus this introduces 
thousands of lines and intersections. Also, for far away nodes that are off the 
screen, it would introduce “broken” lines. The hyperbolic space becomes very 
cluttered and difficult to visualize. 

The graph in Figure 1 (A) illustrates the difficulties of visualizing the 
multiple paths with existing methods. This graph is a hyperbolic space with 
many highly connected paths. It contains n nodes, n-1 primary edges, cycles, 
and many extra cross edges and intersections. This hyperbolic graph becomes 
very cluttered. 

To date, many practical applications have shown the usefulness of 
hyperbolic trees on navigating large hierarchies with millions of nodes. This 
paper generalizes a hyperbolic tree into a hyperbolic space and provides 
techniques to unclutter the display. 
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(A) A Cluttered Hyperbolic Tree 
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(B) New Method: A Cyclic Hyperbolic Space with Invisible Links 
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2. A NEW TECHNIQUE 

In this paper we discuss a new “invisible link ” technique with a 
placeholder. This technique is used for visualizing a large highly connected 
graph in a hyperbolic space without cluttering the display. During the 
visualization, only the primary links are shown. All other types of links (non- 
tree/secondary links) are made invisible and hidden in the property of a node. 
The placeholder is designed to track the links including cyclic conditions. 
These invisible links only appear when the user navigates/clicks on a node. 
This technique allows a user to freely focus on the hierarchy of interest. The 
graph in Figure 1 (B) illustrates a new hyperbolic space with multiple invisible 
links and placeholders. It contains the same number of n nodes, n-1 primary 
edges, and cycles as Figure 1 (A). 

2.1 Definitions 

In a directed hierarchical space with or without cycles, there is a primary 
graph, which links all the nodes in a tree form. These links are primary tree 
links. The others are non-tree/secondary links in a highly connected graph. A 
node can have one incoming primary link and many incoming non- 
tree/secondary links, which are called invisible links. A hyperbolic space is 
defined as a directed hierarchical hyperbolic graph with cross-links and cycles. 
Figure 2 illustrates the definition of a hyperbolic space with cycles. 

Primary Path; (tree-link) A directed non-cyclic graphic link in a 

hierarchical hyperbolic space. Every node has exactly one primary parent 
(except for the root); the link from a node’s primary parent to the node is a 
primary path. The primary parent is the first parent of a node. It is defined at 
the time the hyperbolic graph is created. The other parents linked to that same 
node are called secondary parents. 

Secondary Path (non-tree/cross link) A node may have additional 

secondary parents; the link from a node’s secondary parent to the node is a 
secondary path. 

Invisible Link Node : Node D that serves as an invisible link node contains 
both the primary path and the secondary path. 

Primary Sub-Space Nodes: Defines sub-space parent node and its child 
nodes linked by a primary path. For example, node D and child nodes E & F, 
linked by a primary paths DE, DF, are the primary sub-space nodes. 
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Secondary Sub-Space Nodes; Defines sub-space parent node and its child 
nodes linked by a secondary path. For example, node X and its child nodes A 
and B linked by a secondary path DX are the secondary sub-space nodes. 

Placeholder; Contains the secondary path between the node and secondary 
sub-space nodes. For example, as illustrated in Figure 3, node D has a 
placeholder X, that represents an invisible link to node X. Node B has a 
placeholder D that contains an invisible secondary link to node D. 

As a result, a node (except for the root) has exactly one primary parent in a 
directed cyclic hierarchical hyperbolic space. If a node has more than one 
parent, then exactly one is designated as the primary parent, and the others are 
called secondary parents. For example, in an employee database, the path from 
the regular/first manager is called the primary path. The path from the 
employee to the temporary /second manager is called the secondary path. 




The above graph displays both the primary and the secondary paths. 
Note that (1) Node X has 2 parents: primary parent Y and secondary 
parent D; thus D has a secondary child, X. (2) Node D has 2 parents: 
primary parent Y and secondary parent B; thus B has a secondary 
child D. (3) The link (or path) from D to X is a secondary path; so is 
B to D. (4) This directed graph contains a cycle (X, B, D). (5) Node 
D, X’s child node B has the invisible link property; i.e. it has links to 
secondary child nodes. In order to represent this cyclic relationship, 
we need to introduce two extra lines: one from Node D to X. one 
from Node B to D. 
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The same graph, showing primary links only, this appears as a tree. This 
is the view a user is presented with when he starts navigating. Nodes B 
and D with the invisible link properties are highlighted with color. 

(1) Node D has 2 parents: primary parent Y and invisible secondary 
parent B. One placeholder to node X. 

(2) Node X has 2 parents: primary parent Y and invisible secondary 
parent D 

(3) Node B has one parent X and one placeholder to D. 

Instead of drawing extra lines (illustrated at Figure 2), this new technique 
uses an Invisible Link Placeholder residing in the invisible link node X 
and in D to represent the secondary path to node D and node X. 



Figure 3: A Directed Hyperbolic Space with Cycles 

2.2 Invisible link States and Processing Flow 

The invisible link technique is built on a web-based client-server model 
with multi-threaded parallelism. This technique provides instantaneous 
mapping and unmapping of secondary linked nodes for navigation in a 
hyperbolic space. This technique employs placeholders to maintain link 
relationships. An invisible-link processor is used to manage the following four 
different processing states: 
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State 1: idle state 



A hyperbolic space with invisible link has the same layout as it has without 
invisible links. There are no extra lines and intersections added in the graph. 
An invisible link node A is identified with a unique color. It contains invisible 
placeholders for each secondary path. 
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State 2: activate state 



When a user is clicking on a node that has invisible links, a number of 
placeholders will be displayed from the node. Each placeholder contains an 
invisible “secondary path”. The user can dynamically select a “secondary 
path” for navigation. 
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State 3: map/unmap (move) state 

The invisible-link processor uses parallel mapping and unmapping methods 
to move the primary sub-space nodes, the selected node with its children, from 
their original structure to the current focused node. A placeholder will be 
placed in the original structure to retain the previous path for mapping. 




ufimap 



State 4; Navigation State 

After the selected secondary sub-space node and its children are mapped 
under the hidden-link node, the user can start his navigation. 
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State 5; Reset 



At end of navigation, the hidden-link processor dynamically unmaps the 
secondary sub-space nodes B, C and D from node A, and maps the primary 
sub-space node B with its child nodes C and D back to the original structure. 
The hyperbolic space with secondary sub-space and its child nodes only 
existing during the secondary path processing. The invisible link node’s 
ancestors are checked to continue cyclic mapping of the secondary path. 




3. WEB DATA MINING APPLICATIONS 

This paper describes the structure of an invisible link visualization server 
and a client on the Internet for data mining. We use a web browser with Java 
activator to dynamically create a large hyperbolic space on the web. The 
visualization Web interfaces are standard HTML and a Java applet, which are 
used to explore relationships and to retrieve data within a region of interest. 
The server is integrated with the data warehouse and mining engine. The user 
is put in the driver seat at the client side to mine the knowledge results. The 
system allows the user to access large hierarchies with complex links through 
HTML pages in a Web browser. 

There are many data mining applications with large hierarchical cyclic and 
non-cyclic structures that can be mapped into hyperbolic spaces with invisible 
links. Figure 2 illustrates a cycle that is formed by OfficeProduct, Microsoft 
BackOffice, and SQL Server. A user can reach BackOffice products from 
OfficeProduct, or from SQL Server to find other office products. The invisible 
link technique enables the user to easily navigate through different links 
without being overwhelmed with a large number of nodes and paths. Whether 
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the graph is cyclic or non-cyclic, we are able to hide its complexity and 
provide a simplified hierarchical view to the user. We have applied this 
invisible link method to two data mining visualizations with multiple 
secondary paths and cycles - a topic hierarchy for document navigation, and 
customer call trouble-shooting. 

3.1 Content and Usage Mining 

The first example as shown in Figure 4 is to visualize the content and usage 
of web sites. A hyperbolic space is constructed to present a topic hierarchy for 
millions of documents linked to the web. The topic hierarchy is constructed by 
mining the content of the documents and session logs that record accesses to 
these documents. With invisible link and placeholder capabilities, we are able 
to navigate a large highly connected topic hierarchy on a web browser screen. 




Figure 4: A Topic & Content Hierarchies Sample 



3.2 Customer Interview Web Service 

The second example shown in Figure 5 is a Hewlett Packard internal web 
based troubleshooting application called Interview. We use a hyperbolic space 
to organize and view large numbers of questions and answers in a hierarchical 
structure. Questions are parent nodes, while answers are arcs to child nodes. A 
question can have several answers. An answer can lead to another set of 
questions and answers. With the invisible link capabilities, users are able to 
navigate through answers to link to another set of question and answer group. 
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which does not directly belong to the primary path of the hyperbolic space. 
The user can easily follow the knowledge values to search for recent patches, 
technical tips, and editions. 

There are many different kinds of data mining information like usage 
statistics, areas of new knowledge requests, or even search result matches, that 
can be mapped into the hyperbolic space. The user can analyze this additional 
knowledge through navigation. 




Figure 5: An Interview Application Sample 



4. CONCLUSION 

Data mining applications in the 90’ s continuously face difficulties for visual mining 
of massive large highly connected data sets on the Internet. To date, many practical 
applications have shown the usefulness of hyperbolic space visualizations [4, 7]. The 
invisible link technique described in this paper enables the mining of large hierarchical 
graph with much better visual clarity. 

Invisible link enables the user to easily navigate through different links without 
being overwhelmed with a large number of nodes and paths. Our technique should be 
extensible to a 3D graph hyperbolic space. In addition, the invisible link technique has 
been implemented in several prototypes at Hewlett Packard Laboratories. 
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Abstract While methods for retrieving documents from large information repos- 
itories have improved a lot, presentation of the retrieved documents 
still leaves a lot to be desired. Important information on documents is 
usually presented as a textual listing of available metadata attributes 
such as document size, author information, date of creation, and so on. 
This requires the user to read and abstract from the presented meta- 
information. 

In this paper we present our lib Viewer system, a Java- Applet interfac- 
ing with a number of servers to provide an intuitive metaphor-graphics 
based representation of document repositories. Contrary to most other 
multidimensional data visualization approaches we rely on intuitive real- 
world metaphors to provide a visualization for untrained users rather 
than experts in special interfaces. We introduce a set of metaphors and 
present two prototype systems interfacing with Dublin Core metadata 
based repositories as well as the AltaVista search engine. We further 
provide a first usability evaluation baised on comments obtained from 
users. 

Keywords: Information Visualization, Metaphor Graphics, Digital Libraries, Meta- 
data, Document Databaises, User Interfaces, Usability 

1. INTRODUCTION 

Electronic document repositories have come a long way from the first 
file-listing based archives to modern database systems allowing complex 
query processing. Research in Information Retrieval further provided us 
with additional means to access and extract knowledge from these vast 
sources of information. However, with these resources opening up to 
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the so-called ‘general public’ by providing access via the Internet, more 
efforts have to be and are invested in providing access methods that are 
usable by untrained users rather than optimized for experts only. 

One of the shortcomings of the current representation of document 
databases lies with the fact, that documents are commonly represented 
as sorted lists, providing additional information, such as date of cre- 
ation, document size, author etc. as rather long textual descriptions. In 
order to decide whether a document is relevant or not, the user has to 
read these descriptions or use additional filtering and sorting criteria to 
extract the documents he or she considers most interesting. While this 
may be feasible for expert users, it proves rather cumbersome for non 
experts, who usually do not have a concept of which criteria to filter 
for. This is simply because filtering for document types such as ‘jour- 
nal’ or ‘hardcover books’ or retrieving ‘only documents with more than 
100 pages’ or, even worse, ‘200 KB’ are not natural selection criteria 
in their known environment. Even analyzing the provided metadata to 
find out which concepts are available and which ranges they cover (time 
period, sizes, available document types) involves a lot of reading and 
interpretation. 

On the other hand, taking a look at conventional libraries, we find 
a wealth of information to be conveyed by the physical representation 
of both the library and the books. On entering a library we find books 
sorted by content rather than by a specific relevance ranking criteria, 
allowing users to quickly identify both the topics covered by the library, 
the amount to which they are covered (based on the area assigned to 
specific topics) as well as to locate their section of interest if appropriate 
library maps and descriptions are provided. Within a shelve, by scanning 
the books sorted there, it is usually easy to tell the age of a book, the 
number of times it has been used before, as well as the amount and type 
of information to be expected in the books simply by looking at them. 
The cover of the book, the title, type of binding, the condition of the 
binding (brand new versus well- used and almost torn apart), the size of 
the book, color and other properties of an item on the shelve contain a 
wealth of information that most people are accustomed to and able to 
interpret intuitively. Thus it is easy for us to gain an intuitive overview 
of the contents of a library and the type of information present. What 
we want in the context of a document database is an intuitive graphical 
representation of the metadata usually provided only in textual form, 
which allows us to get an overview of the available information at one 
glance. 

With the libViewer we present a tool for visualizing the documents 
contained in a document database based on the metadata provided by 
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the library system, to allow the user to gain an intuitive overview of 
the type and amount of information available. Contrary to other doc- 
ument repository visualization approaches we do not rely on abstract 
and dynamic mappings of concept spaces to abstract multidimensional 
visualization attributes. We rather favor the use of known concepts such 
as spatial location, physical representation, signs of intensive usage or 
dustiness, all well known from conventional library settings. Our main 
focus is to allow non-expert users to understand and feel familiar with a 
digital library system. While the resulting rather artistic representation 
may not satisfy expert users who, if trained on a system, usually prefer 
powerful computational features to fancy representations, non computer 
experts respond quite enthusiastic to the lib Viewer representation and 
consider it to be very helpful. The capabilities of our system are demon- 
strated on two prototypical applications, followed by an analysis of the 
feedback obtained from users. 

The remainder of this paper is organized as follows: Section 2 presents 
an overview of metadata standards for digital document collections form- 
ing the basis for visualization. We next present the metaphors imple- 
mented in the lib Viewer system in Section 3, followed by a brief descrip- 
tion of the lib Viewers client-server architecture in Section 4. Examples 
of the libViewer interfacing with two servers are presented in Section 5. 
Based on these experiments we present some usability evaluations as well 
as lessons learned for future modifications in Section 6. A comparison 
of our lib Viewer visualization with other approaches to document space 
representations is provided in Section 7, followed by some conclusions 
and an outlook on future work in Section 8. 

2. THE INFORMATION: LIBRARY 
METADATA 

As for all types of information repositories, information about the 
pieces of information stored in them is provided in terms of metadata. 
In conventional libraries we usually find library catalogues for the various 
metadata attributes, listing book titles, authors, printing date and so on. 
In the field of digital libraries, a huge number of initiatives deals with 
the development of metadata standards for digital collections. 

As one of the older examples of such metadata definitions for docu- 
ments we might consider the BibTeX system designed by Oren Patash- 
nik to create bibliographies in conjunction with the LaTeX document 
preparation system (Lamport, 1994). 14 different types of documents 
are described by a set of 24 attributes, providing a wide range of meta- 
data in a rather flexible way. 
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One of the most extensive metadata formats is MARC (Machine Read- 
able Catalogue Format), which originated in the 1960’s as means of 
exchanging library catalogue records. It has evolved into a number of 
derivative standards like the USMARC in the United States, UNIMARC 
for international library data exchange and so on. It provides a highly 
complex set of attributes for describing documents and it is highly de- 
veloped for bibliographic and bibliographic- like data. Albeit, due to 
its complexity, the creation of correct MARC records requires trained 
specialists, limiting its application to professional library organizations. 

There further exists a whole number of different standards for digital 
library metadata designed for special application arenas like CSDGM 
(Content Standard for Digital Geospatial Metadata), CIMI (Computer 
Interchange Format of Museum Information), CDWA (Categories for the 
Description of Works of Art) and many more. 

One of the most promising newer standards for digital libraries is the 
metadata set developed by the Dublin Core (DC) Metadata Initiative 
(http://purl.oclc.org/dc/). It consists of a set of 15 basic attributes 
such as title, creator, subject, publisher, date of creation, etc. used to 
describe digital documents. While the exact specification of some at- 
tributes is not yet defined, the attributes as such have been agreed upon 
and are now being used in a number of projects to describe anything 
from webpages to digital archives. 

To switch to a completely different arena of metadata representa- 
tion, we might consider the page descriptions returned from Internet 
search engines as another type of metadata specification. Although they 
very much differ in the style and extent they are provided by various 
search engines, we still can identify a number of attributes describing 
the various pages, such as title, author of a page, location (URL), date 
of creation, relevance towards a query, the size of the page etc. 

Similar metadata is provided with document collections which come 
in the form of book stores on the Internet such as Amazon (http:// 
WWW . amazon . com), which also provide meta-information about the books 
on sale. This usually includes, apart from the standard description of 
a book, store-specific data such as recommendations or prices, special 
offers and so on. 

For any of the types of metadata an intuitive visual representation 
would provide the user with a possibility to obtain a better overview of 
the documents presented to her or him without being forced to actually 
read the metadata. The goal of the lib Viewer library visualization is to 
provide a graphical representation for the available types of metadata 
in a way that is instantly recognized and interpreted by users without 
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requiring special training or understanding of the concept of the under- 
lying metadata. 

3. THE VISUALIZATION: METAPHORS 

In order to support this intuitive visualization, the lib Viewer pro- 
vides a number of metaphors, which are easily identifiable and relate to 
properties known from the real world (Cole and Stewart, 1993). These 
metaphors, in accordance with (Tufte, 1990), are used to (a) label the 
resources so that they become intuitively graspable, (b) measure them, 
i.e. provide quantitative information, (c) represent or imitate reality 
and (d) enliven or decorate the library representation. Based on these 
premises we identified a set of metaphors to visualize the various meta- 
data attributes in a library setting, where the mapping of attributes to 
metaphors needs to be fiexible enough to allow personalization of the 
resulting visualization. Among the metaphors identified we find: 

■ Representation Type: Each piece of work in a digital library needs 
a physical representation. A set of templates is defined to represent 
e.g. hardcover books, paperbacks, binders, manuscripts, boxes for 
audio, video and software components or links to other libraries 
to provide a realistic visualization of library resources. This set 
can be extended to cover new types of resources as the application 
domain requires. 

■ Color: Being a very dominant feature, color can be used to rep- 
resent a variety of attributes in a very distinguishing way, such as 
language, publication series, genre, topical classification etc. How- 
ever, the fact that it is an abstract rather than a metaphorical 
mapping has to be kept in mind. 

■ Size: The amount of information available in a book or magazine is 
intuitively judged from the size of the physical object by its spine 
width, e.g. the number of pages, thus measuring the amount of 
information available from a specific resource. 

■ Format: Format conveys, next to the type of a document, a lot of 
information on the genre of a document, considering, for example, 
oversize format books such as an atlas or art collection books vs. 
small paperbacks. Thus, the format can be used in a variety of 
ways to further distinguish between a huge number of document 
types and genres by imitating reality. 

■ Logo: When browsing a library, one recognizes the logos of well- 
known publishers, associating them with special types of publica- 
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tions. Thus, while making the library representation look more 
realistic and rather decorating the books, a lot of information can 
be conveyed by having a company logo printed on the spine. 

■ Text: Although the amount of text found on the spine usually is 
limited to a few words, a wealth of information is provided by both 
the text, such as title or author listing, as well as the type of text 
representation, like different fonts or font colors. 

■ Highlighting Glares: Books and other items that have been added 
to a collection only recently usually can be identified at large dis- 
tance by their somewhat shinier color. Thus, glare effects and 
reflections can be used to highlight certain entries in a collection. 

■ Dust: Whereas items in a library that are frequently consulted 
tend to remain rather ’clean’, dust usually settles on books that 
have not been referenced for a long time. 

■ Well-thumbed Bindings: Contrary to recently added items, books 
that have been in a collection for a long time and which are being 
consulted frequently show some signs of intensive usage by crip- 
pled, well-thumbed bindings etc. 

■ Spine Alignment: When taking a look at bookshelves we find, 
that books that are being used frequently usually are not neatly 
aligned with all the other books nearby, but rather tend to stand 
out. In terms of query processing, this metaphor may be used to 
indicate the relevance of a resource with respect to a specific query 
by promoting easier picking. 

■ Location: We support the concept of an array of bookshelves in 
order to have, similar to conventional libraries, resources on iden- 
tical topics located next to each other. This information can either 
be provided in terms of a classification attribute, or can be dynam- 
ically created based on content analysis as e.g. with the SOMLib 
digital library system. (Rauber and Merkl, 1999) 

Based on these metaphors we can define a mapping of metadata 
attributes to be visualized, allowing the easy understanding of docu- 
ments, similar to Chernoff faces for multidimensional space representa- 
tion (Chernoff, 1973). However, great care must be taken in the selection 
and definition of these multi-functional elements, so that the encodings 
can be broken by every user, avoiding the creation of graphical puzzles 
(Tufte, 1983). These mappings depend on the metadata available in the 
respective information repository and are taken care of by the respective 
servers. 
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Figure 1 libViewer Architecture: Applet connecting to a number of servers 



4. LIBVIEWER ARCHITECTURE 

The libViewer is a Java- Applet interfacing with a number of servers 
providing the data to be visualized. Its main task is to provide a library 
representation that is intuitively understandable by the untrained user 
by relying on concepts and metaphors taken from conventional, real 
world libraries. It relies on a server to provide the appropriate mapping 
from the metadata attributes available in a specific document repository 
onto (a subset of) the supported metaphors. 

A conceptual view of this architecture is depicted in Figure 1. The 
lib Viewer applet contacts one out of a number of available servers. These 
servers provide access to various information repositories, which are 
available locally to them in terms of databases or library files. How- 
ever, they can also rely on other servers to provide the information they 
need, actually serving as meta-servers. They retrieve the requested in- 
formation from the appropriate source and provide a mapping of the 
available metadata onto a number of metaphors supported by the lib- 
Viewer. This description is returned to the lib Viewer via a simple pro- 
tocol. The libViewer in turn receives the metaphor-based description of 
the documents and uses it to create a graphical representation. This 
concept allows the lib Viewer to serve as a general interface to document 
repositories of all kinds, with the servers being responsible to provide 
appropriate mappings. 
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5. LIBVIEWER AT WORK 

We currently have implemented two prototype servers for the lih- 
Viewer system representing two different application domains. A pre- 
liminary version of the system is available online at http://www.ifs. 
tuwien . ac . at/if s/research/ir/ somlib/libviewer . html for interac- 
tive exploration. 

The DC Server provides a mapping for a modified Dublin-Core (DC) 
based metadata set. With the DC set being designed specifically for 
digital document collections, it covers a broad range of types of meta- 
data typically found in document databases. We furthermore extend 
the basic attribute set with a few attributes typically collected during 
library operations such as usage statistics for documents. This allows us 
to demonstrate the full capabilities of the libviewer representation. 

The second server implemented so far provides a mapping of the meta- 
data returned by the AltaVista search engine, allowing the libViewer 
to be used as an alternative and more intuitive interface. While be- 
ing less extensive than the DC metadata set in terms of the available 
attributes, this setting, apart from making a prominent application, al- 
lows a straightforward comparison of conventional forms of representa- 
tion with the enhanced visualization of the meta information. 

5.1. DCSERVER: VISUALIZING THE 
DUBLIN CORE 

The DC Server provides a mapping of Dublin-Core based metadata 
onto the lib Viewer metaphors. While we are currently concentrating 
on the document-oriented subset of the Dublin Core system, we have 
extended the basic Dublin-Core metadata set by additional attributes 
that are typically collected during library operation. These attributes 
include the number of times a specific document has been referenced, or 
the date when it has last been referenced. It allows us to demonstrate 
most of the features available with the lib Viewer within one application. 

Figure 2 depicts a representation of a digital library file as created by 
the DC Server. A number of different document types such as hardcover 
books, paperbacks, technical reports and papers can be easily identified 
as their corresponding physical representations, such as the libViewer 
and somViewer technical reports in green binders, the 4 different Lan- 
genscheidt dictionaries as yellow hardcover books or various paperback 
books published by e.g. Springer. They are created by assigning each re- 
source type a corresponding document type representation. In the given 
example, both journal papers as well as conference papers are mapped 
onto the paper representation metaphor. The difference between con- 
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Figure 2 lihViewer: Visualizing DC metadata of documents in a digital library 



ference and journal papers is indicated by their color with the latter 
appearing in a darker color than the white conference papers. Technical 
reports as well as documentations are mapped to the binder metaphor 
with their subdivision in this particular mapping being indicated by dif- 
ferent vertical sizes of the binders. Thus, the hierarchy of document 
types defined in the DC metadata can be mapped onto a hierarchy of 
metaphorical representations. While theoretically a mapping for the 
whole hierarchy of the Dublin-Core specified document types can be 
created that way, evaluation has shown, that a rather high-level map- 
ping of metaphors suffices and even enhances intuitivity, since many of 
the subdivisions available in the metadata are rather unimportant to 
users looking for specific information. 

Further attributes are mapped in a similar fashion, e.g. having the 
logo identify the publisher of a book if a corresponding logo is available 
(e.g. Springer, Langenscheidt, ieee), or having the thickness of the bind- 
ing represent the size of the underlying resource as e.g. for the different 
Langenscheidt Dictionaries. Another straight-forward mapping is pro- 
vided by the degree to which dust has accumulated on the back of the 
books, ranging from a few dust particles to a spider-web covering half of 
a book that has not been referenced for a long time, as it is the case for 
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Figure 3 libViewer: (a) Overview Representation (b) Sorted by Publisher 



the fifth book in the lower shelve. On the other hand, the third book in 
the lower shelve is clearly identified as being frequently referenced due 
to its rather distorted, well-thumbed binding indicating its frequent use. 
Albeit hardly noticeable in the printed representation, we find a high- 
lighting glare in the first book in the upper shelve, indicating - similar 
to shiny new books in libraries - the fact that it was added to the library 
only recently. 

Furthermore, some books like the first ones in the upper shelve as well 
as most binders are not aligned with the backs of all the other books, 
making them stand out and thus promoting easier picking. Contrary 
to that, some books like the third in the upper shelve or the second in 
the lower shelve have been pushed far into the back of the shelves. The 
alignment can thus be used to indicate some kind of relevance recommen- 
dation or, with respect to electronic book stores, indicate promotions. 

In order to obtain an overview of the documents present in the li- 
brary, a look from the distance is provided in Figure 3a. At this level, 
only the most dominant attributes are displayed, such as document type, 
thickness of the binding and the color, whereas the more detailed rep- 
resentations such as dustiness, text on the binding etc. are - similar to 
conventional libraries - only visible in the close-up representation. 

The documents in the library can be sorted based on the metadata 
available. Figure 3b provides a view of the same library, again at the de- 
tailed representation level, sorted by publisher. Additional information 
is provided by the label in the shelve, which in this case gives the title of 
the first and last document in the row. Again, this is basically decided 




Visualizing Electronic Document Repositories 1 05 




Figure 4 libViewer: (a) Documents sorted into shelves (b) Close-up view of shelves 



upon by the server, although it can be modified by the libViewer based 
on the sorting criteria. 

If the server provides a classification of documents, a mored advanced 
shelve-representation is provided by the lib Viewer as depicted in Figure 
4a. Documents are organized into different shelves, with some informa- 
tion on the topic being provided as shelve labels. The detailed view of 
these shelves again is provided in Figure 4b. In this example, we find the 
shelve position in terms of rows and columns printed as shelve labels, 
allowing the users to orient themselves. Again, the most appropriate 
information, such as topical labels for the shelves, should be selected by 
the server. 



5.2. AVSERVER: VISUALIZING SEARCH 
RESULTS 

The AVServer serves as a more intuitive visualization for the search 
results returned by the AltaVista search engine. With the metadata 
returned by search engines being both different and less detailed than 
the Dublin Core, it provides a good proof of the strengths of even a 
limited set of the libViewer metaphors. 

A query is entered via the libViewer interface, passed on to the 
AVServer^ which in turn forwards the query to the AltaVista search 
engine. It then automatically retrieves the first 10 result pages, i.e. 100 
results found by AltaVista and provides a mapping of the available meta- 
data. Based on the standard result pages from AltaVista we can extract 
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Figure 5 AVServer Mapping: (a) 100 hits returned by AltaVista (b) Close-up view 
of results 



a number of attributes including the title of a document, the location 
(URL) and thus the domain it is located in, the size of the document, 
the time it was created, and the language it is written in. Furthermore, 
a short description is provided by AltaVista based on the first few lines 
of the document. These attributes are now mapped onto the appropriate 
lib Viewer metaphors by the AVServer^ such as the document size being 
mapped onto the spine width of the document, the title of the HTML- 
page being printed on the spine, together with a logo representing the 
domain of a document, the document language being mapped onto the 
color and so on. 

A representation of the query result for -hrauber -handreas is presented 
in Figure 5a. The first 100 results returned by AltaVista are depicted 
in the order they were returned by the query engine. The features rep- 
resented at the small-scale representation allow us to obtain a quick 
overview of the 100 result pages. The titles of the first and the last 
document in each shelve are given as shelve labels as well as the ac- 
cording document number. Language, as far as returned by AltaVista, 
is encoded in terms of color, with the blue-red-white documents indi- 
cating english-language documents, gold-black- yellow indicating german 
language documents. Documents for which AltaVista did not return a 
language identification are colored grey. We also find the different docu- 
ment lengths indicated by the width of the spine to be easily discernible. 
Additionally, even at that distance, we can easily differentiate between 



Visualizing Electronic Document Repositories 



107 



tttto; - 
Crtalw 

fubfhhiTt 

ContrEb^tvt 

typt: 

foripit 

LdivnTlfl^fr; 

S&ur<3i: 

Rfqhti; 

Dfii'CrtpUan: 



1 DttafI Window 

S(?^^ltfcAmi|k«I»(lp39ElstUt_ 

:■ . .JlSEif'i"' 






-doc '" ^ 
' 

'*---*.. ' ■ .'sl;.-: 

Ub; A Ejiij 0^^ Digit* 



■“■> j*-- >,.■.■.•-- ■: c-^ > -■ ^ ;;_ ■ - 



: jjfl jfevo^jWoi wifidQw ■ : 






Figure 6 Details Window: Detailed textual representation of metadata 



documents that have been created recently and thus have a highlighting 
glare as opposed to older documents. 

As the mouse pointer moves across the shelves, more information on 
the respective pages is depicted both in the status bar, where the title 
and the document description is listed, and in a details window as pre- 
sented in Figure 6, where all the metadata available on the respective 
document is listed. 

Clicking on the middle of the first shelve gets us to the close-up repre- 
sentation depicted in Figure 5b. Here we can now obtain more detailed 
information on the documents. The title of each page, if available, is 
printed on the spine. The domain where a document is located is indi- 
cated by a logo on the spine, with the nations flag serving as logos for 
national domains, such as the austrian, german or swiss flag and labels 
serving as logos for domains such as com, edu, org as e.g. for the last 
document in the lower shelve. If no logo is available for a given domain 
an unknown- domain logo is mapped onto the spine, as for the eigth doc- 
ument in the upper shelve, which happens to be located in Hong Kong. 
Clicking on a document or on the button in the details window opens 
the appropriate document in another browser. 

6. USABILITY EVALUATION 

Following the first prototype implementations we presented the sys- 
tem to a number of users, consisting of librarians, computer scientists 
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and non-computer-science students, to obtain some feedback on their 
view of its usability. Most of them were immediately fascinated by the 
graphical representation, and the metaphors turned out to be, for the 
most, rather self-explanatory. 

While at the beginning we intended to provide a rather sophisticated 
mapping of attributes and combinations of attributes to metaphors, we 
found, that rather simple mappings usually suffice to provide the infor- 
mation necessary. One of the initial goals was to make the books in 
the library as much as possible resemble their real-world counterparts. 
This lead to — apart from sophisticatedly mapped logos of publish- 
ers — combinatorial mappings of, e.g., document type and publisher 
onto color. For example, dictionaries, if they were published by Lan- 
genscheidt, were colored yellow, whereas dictionaries by e.g Pons were 
colored green. However, while this very real-world like representation 
allowed users to easily recognize the books and to spot and name them 
correctly even in large collections, in turned out to be unnecessarily 
complex for more general applications. 

As the most helpful features we found to be, apart from the obvi- 
ous document type representations, the size of the document, which is 
much easier to be told from the visualization than from the textual rep- 
resentation. However, especially with the AltaVista Server, we found 
the differences in document length for small documents to be too lit- 
tle discernible on some queries, calling for a somewhat more adaptive 
mapping from document size to spine width employing a logarithmic 
mapping function. It turned out that this also corresponds to some ex- 
tent to the real world, where for most larger documents, such as books, 
a thinner type of paper is used, thus not leading to a linear increase of 
spine width with respect to the number of pages. 

The highlighting glare used to identify new documents turned out to 
be another very prominent feature, although some users did not notice 
this metaphor when they first came across it, suggesting to make it more 
dominant. 

We obtained some mixed reports on the mapping of language onto 
color in the AltaVista Server, which some people found to be perfectly 
intuitive whereas others were somewhat irritated at the beginning. How- 
ever, all of them interpreted that metaphor perfectly alright after a short 
time, and it turned out to be one of the most helpful features for telling 
relevant documents apart. 

The country flag as logo helped a lot to locate the documents people 
were looking for, especially if the query was not formulated very precise, 
and when people were looking for information they knew was available 
in a specific country. Complaints were of course filed when no flag for a 
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specific country was yet available and we keep working on improving that 
list. Some (computer-literate) people noted, that the country flags were 
too dominant as opposed to the labels for non-national domains such 
as specifically ‘.com‘, ‘.org’ and ‘.net’, which may contain pages from 
national sources, yet people might fail to identify them simply because 
they are not listed with the appropriate flag. 

The classification of documents into separate shelves turned out to be 
very helpful in terms of segmenting a larger number of books. Although 
we have so far not included the automatic classification of documents 
returned by AltaVista as part of the SOMLIB digital library system 
(Rauber, 1999), simply the segmentation into smaller chunks as well as 
the more realistic representation of the library in the small-scale repre- 
sentation found promising response in terms of usability. 

As people kept interacting with the system, some new features that 
should be included were listed. Sorting the books in different ways al- 
ready helped people a lot once they identified the feature that was most 
relevant to them, such as the domain. However, especially for larger 
numbers of books, filtering unwanted book types or domains would have 
been helpful. 

One of the issues raised by some users concerned the orientation of 
books. Although the text on the spine consists of only a few words, which 
generally can be read without much effort and without actually turning 
ones head, some argued, that a pile of books might be easier to read 
than the vertical shelve-position with the text being given in horizontal 
position rather than vertical. In fact, the shelve position of books is 
only required in the real world where gravity prevents you from simply 
picking a book out of the bottom of a pile. With digital libraries, having 
the books piled on top of each other as several piles rather then sorted 
into a shelve would not provide this type of problems. Still, the effect 
of re-orienting the books has to be evaluated as some people opposed 
to it, saying they might feel uncomfortable with such a representation, 
as books ’hovering in space’ are unnatural. This orientation-question 
needs to be analyzed in more detail. 

With respect to the graphical representation, some users wanted a 
3-dimensional view of the library allowing them to actually move through 
the library and pick up books. Still, as most users did not have any spe- 
cial 3-d plug-ins available at their systems, not to mention special 3-d 
viewing devices, they felt sufficiently satisfied with the 2-dimensional 
representation, especially as the 3-d effects in terms of book and shelve 
representation were considered more than realistic enough to create a 
3-dimensional impression. However, it definitely merits further consid- 
eration. 
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7. RELATED WORK 

With the massive increase of the amount of information available in 
digital form, sophisticated methods for dealing and interacting with elec- 
tronic information repositories were developed. In this Section we pro- 
vide pointers to research work addressing a variety of issues in infor- 
mation space visualization, ranging from content-related approaches via 
information organization to document visualization as such. 

Research in Information Retrieval (IR) has produced a number of sys- 
tems allowing, apart from mere database searches for titles and authors’ 
names, full text scanning of large text corpora, retrieving documents on 
specific topics or describing special concepts (Hahn et al., 1996, Hearst, 
1994, Salton et al., 1993). Apart from document retrieval, systems an- 
alyzing a set of documents to provide question answering are emerging 
(Aliod et al., 1998), which try to analyze the semantics of a question and 
try to create an answer based on the information stored in an underlying 
document collection. 

While most of these methods allow the selection of a subset of entries 
of a digital library, we are still left with the problem of (a) identifying 
those items of interest from the sometimes still huge subset of items re- 
turned by search engines and (b) locating relevant information when no 
(more detailed) query can be identified as such. This problem can be de- 
scribed as document archive browsing or archive exploration as opposed 
to document retrieval. In order to be able to browse a collection of docu- 
ments we need it to be visualized in a way that allows us to get an instant 
overview of the information present. This necessity of an enhanced li- 
brary representation has been addressed in a number of projects, trying 
to provide convenient access to digital document collections. 

To overcome the basic limitations of the one-dimensional ranked- 
list representations of most search engines, we developed the SOM- 
Lib digital library system (Rauber and Merkl, 1998; Rauber, 1999; 
Rauber and Merkl, 1999), a 2-dimensional map display which auto- 
matically organizes a set of documents by their contents (available at 
http://www.ifs.tuwien.ac.at/ifs/research/ir/somlib). The self- 
organizing map (SOM) (Kohonen, 1995), a popular unsupervised neural 
network model, is used to produce a content-based document clustering. 
This approach has been used in a number of other projects for document 
classification so far (Kaski et al, 1997; Lin et al., 1991; Merkl, 1997). 
A web-based interface allows the interactive exploration of documents, 
with the spatial organization of the collection allowing documents on 
similar topics to be found close to each other, which is similar to real- 
world library organization. This capability makes the SOMLib represen- 
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tation particularly useful in digital library exploration. However, in spite 
of the 2-dimensional topical clustering, no further meta-information on 
the documents can be extracted from the standard SOMLib Web inter- 
face. 

Another map-based representation of documents is provided by the 
Nemo project (Hascoet and Soinard, 1998), showing the main attributes 
of a set of documents as icons of different color, patterns and text, how- 
ever without support for automatically organizing the documents ac- 
cording to their content. Still, the visualization must be viewed rather 
in the perspective of multidimensional information visualization, not fo- 
cusing on intuitively interpretable real-world metaphors. 

An approach for visualizing the contents of texts is presented in (Rohrer 
et al., 1998), where the main concepts of a text are used to span a mul- 
tidimensional shape which is rendered to form 3-dimensional shapes, 
allowing the detection of documents on similar topics as documents ex- 
hibiting a similar shape. 

One of the first applications of metaphors in the digital library arena is 
reported in the Bookhouse project (Pejtersen, 1989), where a document 
database is represented as a storehouse consisting of different rooms. A 
number of search strategies can be followed, which in turn are indicated 
by various images, such as a clock to search by the time dimension, a 
globe for search by geographic location of books etc. However, apart 
from metaphors for different interaction mechanisms, no visual repre- 
sentation for various types of documents and for available metadata are 
created, making the lib Viewer a prominent complementary system. 

A set of various visualization techniques for information retrieval and 
information representation purposes was developed at Xerox PARC as 
part of the Information Visualization Project (Robertson et ah, 1993). 
Information is depicted in a 3-dimensional space with the focus being 
on the amount of information visible at one time and an easily under- 
standable way of moving through large information spaces, focusing on 
the visualization of the content rather than the metadata of documents. 

At the CNAM library, a virtual reality system is being designed for the 
visualization of the antiquarian Sartiaux Collection (Cubaud et al., 1998) 
The binding of each book is being scanned, and mapped into a virtual 
3-dimensional library to allow the user to/experience the collection as 
realistic as possible. Here, the purpose is not so much to provide intuitive 
access to information, but rather to allow the user to experience the real 
collection in a virtual setting. 

While all of these methods address one or the other aspect of docu- 
ment, library, and information space visualization, none of these provides 
the wealth of information presented by a physical object in a library, be 
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it a hardcover book, a paperback, or a video tape, with all the infor- 
mation that can be intuitively told from its very looks. Thus, they still 
have to rely on textual metadata to present important information on 
the documents. This obviously calls for a combination of the various ap- 
proaches in order to obtain a digital library usable by everybody instead 
of experts only. 

8- CONCLUSIONS 

We have presented the lib Viewer system as a general interface to elec- 
tronic document collections. It provides an intuitive graphical represen- 
tation of documents based on the metadata available. Instead of reading 
textual descriptions of documents, their types, sizes, age etc., metaphor 
graphics are used to convey this information in a self-explanatory way. 
As a Java Applet the libViewer may contact a number of servers which 
provide the mapping of metadata available in the repository they serve 
onto the metaphors supported by the libViewer client. We have demon- 
strated the capabilities of our system using two servers. The DCServer 
connects with Dublin-Core based library description files and provides 
mappings demonstrating the wealth of information that can be conveyed 
to the user in graphical form. The AVServer provides a mapping from 
the results returned by AltaVista allowing users to get an improved, 
intuitive visual representation of the documents found. 

Both systems have shown to be very helpful in initial usability evalu- 
ations and the graphical representations were highly appreciated. Some 
issues raised during these evaluations, such as the orientation of books 
or a different calculation of the spine width to allow for better separation 
between small and large documents, are currently under investigation. 
While initially being conceived solely as a representation system, addi- 
tional functionality requested by users will make the libViewer evolve 
into an application supporting more flexible query interfaces as well as 
additional interaction and result modification facilities. 

Following the promising results of our initial evaluations with the 
DCServer and the AVServer we now plan to implement servers to some 
other widely used document repositories. These shall serve as a broader 
basis for more advanced usability evaluations. 
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Abstract Contents-based retrieval of multimedia information has been investi- 
gated in several research projects. In this paper, we will focus on an 
automatic indexing method for human motion data. We convert a mo- 
tion data, which is represented as time series of 3-D position, into a 
symbol sequence. We call this method as conversion automatic index- 
ing. The automatic indexing is performed in a pattern matching ap- 
proach. Reference patterns are necessary for pattern matching, so that 
we will propose two methods to define primitive motions in order to 
make reference patterns. The first method divides motion data into 
segmental motion data by detecting the change of motion speed. The 
second method classifies segmental motions such that similar segmen- 
tal motions are gathered in the same cluster. In order to evaluate the 
similarity between two segmental motions, we use the Dynamic Time 
Warping (DTW) method because each segmental motion takes differ- 
ent time length even if the same person performed the same motions. 
Motion data can be converted into a symbol sequence which represents 
a sequence of primitive motions. Then, Continuous Dynamic Program- 
ming (CDP) method is used to recognize contents of motion. CDP 
is one of the extensions of DTW. It makes us possible to recognize a 
motion with ease even if it is complex. 

Keywords: primitive motion, motion database, motion recognition, contents-based 
retrieval, dynamic time warping 

1. INTRODUCTION 

The necessity to deal with human motion data on computer is growing 
in the field of movies, video-games, animations and so on [Stuart et ah, 
1998]. Storing motion data is also important so as to reuse them. Since 
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motion data is difficult to be described by using key words, it is better 
to use motion data itself as query to access the database. This is what 
is called contents-based retrieval. A database system for contents-based 
retrieval accepts vague queries and it performs a best-match search to 
find data that are likely to be most relevant to the queries. Contents- 
based retrieval is based on features that contain clues about the content 
of data. These features are generated by an automatic indexing process. 

In this paper, we will focus on automatic indexing method of motion 
data [Osaki et ah, 1999]. The method is based on speech recognition 
technique, because motion is similar to speech in the sense that they are 
time series data. Two major approaches in the research of speech recog- 
nition have been proposed, probabilistic model based approach [Kuhn 
et ah, 1995] and pattern matching approach [Sakoe et ah, 1971]. In 
both approaches, speech is phonetically transcribed to phonemes, and 
phonemes are compared to reference patterns or models. The problem 
is that primitive motions have not been studied to employ the speech 
recognition methods. Therefore, we have to develop both a method to 
extract primitive motions and automatic indexing method. 

After primitive motions are extracted, each of them is represented by 
a symbol. That is, motion data can be converted from a huge amount 
of position data into a small number of symbols. This makes it easy to 
analyze a motion data but still leaves two problems: One is a wide variety 
of motion patterns and the other is the indexing error due to the noise 
that occurs when motion data is converted into a symbol sequence. To 
solve these problems, we use Continuous Dynamic Programming (CDP) 
[Hayami et ah, 1984]. CDP is a method for connected word recognition. 
It uses a dynamic programming approach to align the time series and a 
specific reference pattern so that some distance measure is minimized. 
Since the time axis is stretched (or compressed) to achieve a reasonable 
fit, a reference pattern may match a wide variety of actual time series. 
By using CDP, connected motion can be recognized and the indexing 
error is reduced by compressed time axis. 

2. RELATED WORKS 

Many researchers focus on the analysis of human motion due to its 
variety of applications. For example, Rohr [Rohr, 1994] proposed a 
model-based approach for the recognition of walkers. He used a 3D- 
model to represent the human body and applied a Kalman filter to 
estimate the model parameters. Gavrila et al. [Gavrila et ah, 1995] 
proposed a model-based tracking and recognition system. Their system 
contains two components: (1) taking real image sequences acquired from 
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Figure 1 An example of motion capture system. 



multiple views and recovering the 3D body pose at each time instant, 
(2) representing and recognizing of human movement patterns. 

Generally speaking, motion recognition method can be divided into 
two phases: retrieving an appropriate motion data and recognizing the 
content of motion data. Most of researchers stress the former phase and 
they try to retrieve 3-D motion information from 2-D images. But, it 
is difficult because images are influenced by many conditions such as 
background image, person, clothes and so on. In addition, if the move- 
ment is more complex, the extraction of motion information becomes 
more difficult because of serious occlusion. Thus, motions which can be 
recognized by humans are limited to simple motions, like waving hands, 
rising one’s hand, walking and so on. 

On the contrary, we are interested in the latter phase. We use optical 
motion capture system to record time series of 3-D motion data. Figure 1 
shows an example of performer wearing 24 markers which reflect infra- 
red ray on each joint. Each motion data is represented as 3-D locations 
of each joint. 

3. EXTRACTION OF PRIMITIVE MOTIONS 

First of all, our system divides all the motion data into segmental 
motions [Pavlidis et ah, 1974] [Das et ah, 1998] by detecting the change 
of motion speed. Next, it classifles similar segmental motion data into 
the same cluster by using the nearest neighbor algorithm with DTW 
algorithm for distance function. We call each cluster primitive motion. 

We have to And breakpoints where primitive motion data are com- 
bined. Generally, every motion can be analyzed as follows: start motion. 
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(a) before checking speed (b) after checking speed 




Figure 2 An example of checking speed. 




Time : sequence Time : sequence 



Figure 3 Discernment of breakpoints. 



transitional motion, and stop motion. Start motion represents a change 
from zero speed to a movement, transitional motion represents a change 
in the speed and stop motion represents a change from a movement to 
zero speed. Thus, each change in the speed of a body part can be con- 
sidered as a breakpoint. However, vibration, such as hand shaking, may 
be detected as breakpoints by mistake. For this reason, the algorithm 
calculates the variance between all the candidate points. If the vari- 
ance is small, it means that vibration occurred between the candidate 
points. These points should be neglected, and removed from the candi- 
date points. This process is executed in x, y, and z-axis respectively. 

Figure 2 shows an example of this process. Figure 2 (a) is an original 
data, and (b) is a result of checking speed. In Figure 2 (b), a circle 
represents a breakpoint. Almost the breakpoints are proper, but some 
of them are not. For example, four breakpoints (white points in Fig- 
ure 2(b)) are detected by mistake. They should not be considered as 
breakpoints. To remove such breakpoints, the other process is executed 
to evaluate the spatial relationship view. Figure 3 shows an example 
which represents a motion data in one dimension for simplicity. 

The motion data in Figure 3 (a) includes ten points, in which five 
of them are candidates to be breakpoints (black points). Figure 3 (b) 
represents the breakpoints which were detected by our algorithm. One 
of the original candidate (“L” in Figure 3 (a)) was removed in Figure 3 
(b). 
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The algorithm works as follows: let us consider the sequence of three 
candidate breakpoints (i.e. “F,” “M” and “L”). Distances from “F” 

to all candidate points are calculated in order to find the most distant 
point from “F.” In the same way, the most distant point from “L” is 
found. For example, “dj” (1 < ^ ^ 4) represent the distances from 
“F” to four candidate points, and “^2” is the most distant point from 
“F.” If neither of most distant points from “F” and “L” are “M,” this 
means slight vibration occurred in the motion data. In other words, 
“M” is unsuitable for the boundary of primitive motions. Thus, “M” is 
removed from the candidates. And the next three candidate points are 
considered until all the candidate points are checked. Finally, we can 
obtain Figure 3 (b) as the result. 

Dynamic Time Warping. DTW is a method that was developed 
in the field of speech recognition. DTW calculates a similarity of the dis- 
crete time series. We can recognize whether the discrete time series are 
the same or not in accordance with the similarity. There is an example. 
Assume two discrete time series A and B are given as follows: 

A — ai, . . . , tti, . . . , 

B = 61 , . . . , . . . , (1) 

where ai and bj are the i th and j th points of given A and B. The 
similarity between A and B is given the following formulas (2) and (3). 

d(i,j) = ^J{ai - bjf (2) 

S{i,j) = d{i,j) + min{S{i-l,j),S{i-l,j -1)} 

DTW{A,B) = S{M,N), = (3) 

Although this similarity function is enough for 1 dimensional data, such 
as speech recognition, we must extend the function for 3 dimensional 
data for motion recognition. Then we introduce formula (4) instead of 
formula (2), 

— \j {^Xi bxj ) + iftyi ~~ byj ) + {dzi ~ bzj) (4) 

where a, 6, i and j are the same in formula (1). y and z represent the 
axis for 3-D space. In order to deal with 3-D position data as spatial time 
series, it is necessary to evaluate x, y and z position data simultaneously. 

4. AUTOMATIC INDEXING 

There are two problems that make it difficult to analyze motion data: 
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1 Boundary detection of the motion data. 

2 Running time of the analysis algorithm on huge data, such as mo- 
tion data. 



Therefore, we convert a motion data into symbol sequences in terms of 
primitive motions. We call this conversion “automatic indexing.” 

Automatic indexing is executed with the following procedure. Sup- 
pose that we have F varieties of motion data Mi, . . . , Mf^ . . . , Mp as 
references. Each motion Mf represents a primitive motion, such as 
throwing a ball, jumping and so on. Mf consists of the 3-D time se- 
ries data for each body part, m^, where p is the identifier of the body 
part. 

First of all, we divide each into 5^1 , ... , s^n segments and classify 
them into Cp^ , Cp sets, where k is the number of clusters for each 
body part. For each cluster Cp we assign a symbol a^, and each will 
be over the alphabet • • • 5 ^p}« The indexed is obtained by 

looking for the cluster such that contains the most similar 

data to 5^2, and by replacing s^i with corresponding symbol Each 
is the index of Then, 






(5) 



We call a symbol sequence, then a pair of and the 

symbol sequence is stored in the database as reference patterns. 

We also divide the input motion data Minput into . . . , s'^p^'^^n 

segments. The symbol sequences of are also obtained by: 

1 finding s^kt which is most similar to such that s^kt E 

The similarity is given by DTW function. 

2 using the corresponding symbol to 
Thus rrV'pP'^^ is converted into following sequence. 



m 



input _ 






( 6 ) 



Then symbols are assigned to any s'^pP'^^t^ and input motion data is also 
converted into the symbol sequence. 



5, MOTION RECOGNITION 

Motion recognition is carried out by comparing those symbol se- 
quences of input with those of the reference patterns. If the sequences 
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contains complex motion, it is necessary to find boundaries and to match 
segmental patterns with reference patterns. However, there are many 
candidates of a boundary and a decision of boundary influences to match- 
ing process. To solve this problem, we will introduce continuous DP 
(CDP) to analyze symbol sequences. 

One of the features of CDP is that three operations of motion bound- 
ary detection, nonlinear time alignment, and recognition are performed 
simultaneously; thus, recognition errors due to errors in motion bound- 
ary detection or due to time alignment errors are not possible. The 
algorithm is forced to match complete motions, and as a result of this, 
the motion boundaries are determined automatically. 

Assume that two discrete time series A and B are given as formula 
(1). First of all, CDP calculates similarity Si between any sub-series 
ai, . . . , Ui in A and B by the following formula (7): 

(7) 

{ Si{l — 2, m — 1) + 2Si(l — 1, m) + d(/, m) (a) 

Si{l — 1, m — 1) + 2d(/, m) (b) 

Sill — 1, m — 2) + 25i(/, m — 1) + d(/, m) (c) 

where d{l^m) is a distance between ai and bm- Si{i^m)^ the sum of 
d{l,m), is normalized by Li{i^m) to remove the fluctuation caused by 
difference of time series length in any ai, . . . , Finally the similarity 
is given as: 

( 8 ) 

Li{i^m) is obtained simultaneously with Si by using the following for- 
mula (9): 





= 0 








r l) + 3 


{if (a)) 






= Li(/- l,m- l) + 2 


{if (b)) 


(9) 




[ Li(/-l,m-2) + 3 


{if (c)) 




EndPoint 






(10) 



For the symbol sequence, we give the distance between ai and bm as 
follows: 

Figure 4 shows an example of motion recognition with CDP. Assume 
that a symbol sequence X of a body part is given as input, and we have 
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input 



X=^i ^ bi ai ^ X=^i ^ 

I I A=3i Qz do I reference I A=ai az 3 j A=ai az 83 I reference I 



X= aiazaza3bib2aia3b3 B= bi bs 
X=^ai ^ 



A=ai az £b 
B=bi bz ba 



reference 



(a) 




B=bi ba 
A=ai az 



(c) 



B= bi bz bj 
(d) 



X=^| I ^ 



I input I 



reference | A=ai az B=bi bz ba | reference] 

A= 3i az % 

(e) 



Figure 4 An example of CDP. 



Table 1 Clustering results. 



Segment name 


Number of segments 


Number of clusters 


Number of errors 


percentage 


Left wrist 


466 


203 


27 


86.7 


Right wrist 


469 


187 


23 


87.7 


Left elbow 


395 


100 


19 


81.0 


Right elbow 


397 


107 


18 


82.4 


Left knee 


206 


30 


12 


60.0 


Right knee 


195 


25 


9 


64.0 


Left foot 


258 


50 


20 


60.0 


Right foot 


248 


50 


17 


64.0 



two reference symbol sequences A and B for the same body part (Fig- 
ure 4 (a)). CDP calculates the similarities CDP{X, A) and CDP{X^ B) 
(Figure 4 (b) and (c)). Then, it chooses min{CDP{X^ A), CDP{X^ B)] 
= CDP{X, A) as A is the first pattern in X, and detects the mo- 
tion boundary between as and bi in X. In the next step, it chooses 
min{CDP{X^ A),CDP{X^ B)} = CDP{X^B) as B is the second pat- 
tern in X (Figure 4 (d) and (e)). Finally, it gives the recognition result, 
that is, X = AB. 

6. EXPERIMENTAL RESULTS 

Our motion data are based on 24 different aerobic exercise. In this sec- 
tion, we show three experiments, clustering results, automatic indexing 
results and motion recognition results. 

6.1. CLUSTERING RESULTS 

Table 1 shows that there are some cases that classification errors oc- 
curred, and a large number of clusters are constructed. Assume two 
human performers with different length of arms. Even if they try to 
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Table 2 Motion recognition results. 



Segment name 


Correctly recognized segments 


Total number of segments 


Accuracy 


Left wrist 


92 


106 


86.8 


Right wrist 


83 


104 


79.8 



perform the same motion, their left wrists take different courses. This 
is because our system incorrectly speculates that these two motion data 
are different since the similarity of the motion is evaluated by the for- 
mula (4). 



6.2. MOTION RECOGNITION RESULTS 

Table 2 shows the result of connected motion recognition of two body 
parts. In Table 2, accuracy is the percentage of correctly recognized 
segments from total number of segments of the input motion data. CDP 
found the boundaries for the most of reference symbol sequences on the 
input symbol sequences. The recognition results are as high as almost 
80% accuracy although “noise” is included in the input motion data. 
That is to say, CDP can reduce the influence of “noise” and works the 
recognition process effectively, but more improvements are needed for 
higher accuracy. 

Recognition errors are classified into two groups. Those errors are 
caused by CDP and the reference symbol sequences, the reference mod- 
els. The reason is as follows: CDP detects the boundaries step by step. 
CDP starts next recognition from just after the end point, which is de- 
cided by the previous matching as shown in Figure 4 (d). If the boundary 
is not correctly detected, the next matching will be resulted in fail. In 
other words, if previous matching is detected wrong, it will affects the 
next matching result. 

The reference models are also important. In this experiment, the 
references are unique symbol sequences for each motion, but widely fit- 
able reference models are required for such a broad variety of data, 
motion data. 

7. CONCLUSION 

In this paper, we suggested the method to extract primitive motions 
from human motion data, including results by using this method. This 
paper proposed that human motions can be decomposed and the system 
can extract them as primitive motions. We also suggested a motion 
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recognition algorithm in terms of symbol sequences. Those sequences 
represents the motions as the sequences of extracted primitive motions. 
Our method can easily convert large time series of 3-D motion data into 
a small number of strings called symbol sequences. However, results 
shown in the Section 6 have to be improved to be applied for practical 
contents-based retrieval of motion data. 
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Abstract Model-based user interface development environments show promise 
for improving the speed of production and quality of user interfaces. 
Such systems usually have separate description of domain, task and 
presentation structure. The Teallach system applies model based 
techniques to the important area of database interfaces, which 
increases the importance of domain information. This exists in the 
form of a schema and can be captured in a high-level format, so that 
the developer need not build a domain description arbitrarily. This 
paper describes such a Domain Model, how it is captured and how it 
contributes to the systematic development of a user interface. 

Keywords: Model-based UIDE, Domain Model, Conceptual Modelling 



1. INTRODUCTION 

Database systems have long been criticised for providing inadequate 
facilities for user interaction. Building a user-interface previously required 
recourse to significant amounts of programming to augment the database. 
Although most commercial systems now supply integrated support for 
application user interface development, the support is neither systematic nor 
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of complete generality. Developers cannot re-use components from previous 
applications, nor can they go beyond the limited scope of the interface 
development environment, without returning to the need to write programs. 

The model-based approach [Szekely 1996] is more systematic and 
generalisable, the developer first constructing declarative specifications or 
models of the various aspects of the user interface, from which the 
application and user interface are produced. A Model-Based User Interface 
Development Environment (MB-UIDE) permits the description of the 
following aspects: a domain model which describes the structure of the data; 
a task model which describes the structure of the activities; a presentation 
model which describes the user interface features; a user model which 
describes characteristics of various user roles; and a dialogue model which 
describes the interaction between user and application. For a full account the 
capabilities of such models, the reader is referred to [Griffiths 1998]. 

Previous MB-UIDEs have been built in the context of applications in 
general and have not typically been applied to data-intensive applications. 
The Teallach system [Barclay, 1999] has been constructed to explore the 
possibility of using the MB-UIDE approach in the context of Java-based 
Object-Oriented Database Systems (OODBS). Teallach supports the 
specification of Domain, Task and Presentation Models for the application, 
providing meta-models for each of these - Teallach does not, as yet, have a 
User Model, while Dialogue aspects are integrated into the Presentation and 
Task Models. Teallach is supplied as a single piece of software in which the 
models can be developed and integrated before an application is generated. 

Focussing on database applications brings the Domain Model (DM) into 
a central position. In previous MB-UIDEs, the DM has played a subsidiary 
role. Given database involvement, several considerations change. Firstly, 
there is already a version of a Domain Meta-Model - the logical data model 
of the database system. Secondly, when considering an OODBS, this model 
is not consistent. Moreover, when we restrict ourselves to the need to 
provide user interface development tools for existing databases, a DM 
already exists (the schema) and need not be developed from scratch. 

This short paper briefly describes how domain information is extracted 
and used in Teallach. More details of the Teallach system can be found at 
the Teallach web-site. [Teallach, 1999]. 



2 . THE DOMAIN MODEL OF A MB-UIDE 

The DM of an application in a MB-UIDE describes the structure of the 
data. This activity is very familiar to database programmers as it is nothing 
more than conceptual schema design. In fact, the task of specifying the DM 
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of the application has already been carried out, since the interface is being 
developed in terms of an existing database. Instead of the developer having 
to specify the DM as one of the main tasks, the task is to pick up the existing 
database schema and use it as the DM. To do this, Teallach connects to the 
database and retrieves the schema from the meta-data, creating an internal 
representation of it which is the DM. Teallach does not support (presently) 
the tasks of schema design/DM specification nor the modification of these. 

The DM is a conceptual model, and the Teallach DM has the goal of 
describing as much of the data structure as required for describing the user 
interface. The DM is therefor high level and describes the database in terms 
of concepts appropriate to a user’s understanding of the application domain. 

Teallach operates in the context of OODBS[Cooper, 1997] such as 
POET[POET, 1999] or Objectivity [Objectivity, 1999]] and hence the DM is 
object-oriented. The Teallach DM describes data in terms of classes, 
methods, inheritance and so on. However, the lack of a common and agreed 
OO model is a significant problem. Each OO concept (inheritance, 
information hiding, etc.) has a number of divergent semantics and the 
product of these individual variations gives rises to an enormous number of 
meanings of the term “object-oriented data model”, many of which are 
realised in OOODBS products. 

Since one of the goals of Teallach is to be platform independent, we need 
a single consistent model for our internal representation of the DM. Here we 
make concrete use of the main OODBS standardising - the ODMG standard, 
which comprises definitions of an object model, an object schema definition 
language, an object query language and language bindings to Java, C++ and 
Smalltalk. It is only the Object Model which interests us. 

We would have liked to use the object schema definition language (ODL) 
to derive a standard description.. However, compliance with the ODMG 
standard does not extend to the use of ODL in any form. What we can 
expect from ODMG-compliance is the existence of a relatively standard 
form of the query language and language bindings. Schema descriptions are 
not standardised, nor is the underlying data model. 

We had, therefore, to determine, on a product-by-product basis, the 
appropriate mechanism for retrieving the schema of a database. For 
instance, the POET 5.1 Java Binding[POET, 1997], uses an associated 
configuration file which names the classes to be found in the database. From 
this, we can begin to create a DM based on the classes named in the file. We 
then exploit the fact that we are developing our system in Java, in which case 
we can now make use of Java’s introspection mechanisms. We find and 
load the classes in the database using this mechanism and introspect over 
them to discover the variables and methods in the classes. From this we 
create a complete description of the DM. In the next section, we consider 
the form that this description takes. 
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3. A DATABASE INDEPENDENT DOMAIN MODEL 

The DM of Teallach is based strongly on the ODMG Data Model[Cattell 
et aL, 1997]. This was an important contribution to the success of the 
project since we did not have to invent a model; and could expect a fair 
degree of conformity between the “ODMG-compliant OODBMS” and this 
model. Consequently, capturing metadata in this model is greatly simplified. 
Here, we briefly review the ODMG model and what it supports, before we 
discuss the implementation which constitutes our DM. 

3.1 The ODMG Data Model 

The ODMG data model provides a standard to which OODBMS products 
should adhere. The model is for database use and so it is intimately 
concerned with issues of efficiency, data modelling and database 
management. The latter has meant that the ODMG model includes a view of 
how databases are structured, has a specification for transactions, has 
standard classes for domain types and collections, and provides for user- 
detectable keys. 

The model comprises several parts: distinct parts which deal with objects 
and with literals', support for structured (i.e. record) values; the distinction 
between two kinds of class variable: relationships, which are objects with 
automatically maintained inverse references; and attributes which may be 
literals or objects, but which do not automatically support an inverse 
reference; multiple inheritance of behaviour, but single inheritance of state; 
database specific extensions including extents and keys; and exceptions. 

This means that an ODMG schema constitutes a rich description of the 
structure of the data and provides the Teallach system with substantial 
support in building a description of the database that can be of great use in 
the interface development process. 

3.2 The Teallach Domain Model 

The Teallach Domain Model is based heavily on the ODMG Model. 
Unfortunately, since we require a concrete implementation of most of the 
above aspects, we have run into several instances of a lack of precision in the 
ODMG specification. At any point of ambiguity, we have had to take a 
particular view in order to complete our implementation. 

The Domain Meta-Model is essentially a Java package that realises each 
of the ODMG concepts as a class. There is a repository class, whose 
instances corresponds to database schemata. The class descriptions are 
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created as instances of the metadata classes in the ODMG model, such as 
Operation, Parameter, Type, Exception and so on. 

There have also been classes created for Databases, Transactions and 
OQL Queries. Each of these has special responsibility for encapsulating 
some aspect of user interaction with the database. The query class is 
important, as it means that the interface can exploit any query optimisation 
techniques which have implemented. 

The principal role of the DM is to represent the underlying application, 
and database connectivity and interaction. In addition, however, it models 
auxiliary data types that may be required to describe transient data vital to 
the runtime operation of the application and interface. Auxiliary data is also 
modelled using the ODMG derived building blocks in order that the 
representation of domain components is orthogonal to their source and 
persistence. It is vital to have the ability to model transient data, since any 
sophisticated user interface will require the ability to manipulate data which 
is passed between interaction components but never stored. 

3.3 Creating a Teallach Domain Model 

The Teallach system uses the DM component as a fixed point for 
interface development. The developer fetches a schema from the database 
system, which is then visualised as a hierarchical display of classes and their 
components. Teallach supports the processes of capturing a DM and 
describing the other models in any order. To do so it proceeds as follows: 

• The list of persistence capable classes in the application is requested 
from the database system. In the case of POET5.1, the configuration 
file holds both the names of these classes and the application name. 

• Each of the classes is then introspected in order to discover their 
internal structure in terms of their fields, methods, and exceptions. 

This results in a DM represented as a collection of instances of the DM 
classes. The information contained within the DM for a given application 
can be used in two ways. Firstly, it can assist and inform the designer in the 
generation of the other models in the Teallach system. For example, the 
functionality available on a particular class can assist the designer in 
developing a part of a task model, while the type of a parameter in the 
signature of an operation can enable the designer to decide which 
presentation widget would be the most appropriate for displaying or 
obtaining that information. It can also be used by the Teallach system to 
generate components of other models for the given application 
automatically. The nature of the relationships between the different Teallach 
models will be discussed in the next section. 
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4. INTEGRATING THE DOMAIN MODEL WITH THE 
OTHER TEALLACH MODELS 

After the DM has been captured, Teallach allows a developer to create 
the rest of the interface description in any order. This section shows how the 
development process proceeds from the DM. We start by describing the 
design environment in general, then show how the Presentation Model is 
developed, followed by the Task Model. Finally, we show how the whole 
process is completed. 

4.1 The Teallach Design Environment 

The figure shows a screenshot of the Teallach design environment, which 
consists of editors for the three models within an overall tool environment 
providing project management, editing, model linking and code generation 
facilities. The toolkit uses a desktop metaphor and direct manipulation 
techniques for model construction. 

This environment supports the following tasks: 

• the capture of the Domain Model as described above; 

• the creation and editting of a Task Model; 

• the construction of a Presentation Model from an available toolkit; 

• linking the various models; and 
generating the user interface thus described. 
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The Task Editor supports the specification of both the structure of user tasks 
and the flow of information between the models when carrying out the user’s 
tasks. A Task Model is a goal-oriented task hierarchy, with leaf nodes 
representing user interaction or database tasks. Non-leaf-nodes specify the 
temporal ordering of the children, e.g. indicating sequential, parallel or con- 
current sub-task execution. The Task Pane allows the task hierarchy to be 
edited by modifying, adding or removing tasks. Furthermore, leaf tasks can 
be associated with domain or presentation components as described below. 

A Presentation -Model describes the appearance and surface behaviour 
of the user interface. It has both a concrete and an abstract aspect. 
Concretely it is a widget set out of which components of the user interface 
are built. Abstractly it partitions the space of widgets in terms of their 
purpose, e.g. display, edit, choose, invoke action.. Teallach supports a 
repository of widgets registered in terms of both concrete and abstract 
representations. The Presentation Pane shows a tree- view of the widgets and 
the ability to toggle between abstract and concrete views. 

Having constructed the three models in the separate sub-windows, there 
are two ways of associating elements from different models: linking and 
generating. Linking creates an association between existing components in 
two models, for instance linking a task to a presentation or domain action. 
Generation creates new components in one model based on components in 
another, for instance generating a task from a DM operation. 

Linking is achieved by drawing a rubber-banded line from an element of 
one model to an element of another, to associate the two. This can be seen 
in the figure, which shows a link being created between the task model 
Connect action task and the domain model connect operation which will 
mean that if the generated application executes this task, the database 
connection operation in the DM will be invoked. Teallach currently uses a 
simple hyperlink metaphor to show associations between linked model 
components; this allows the designer to jump to an associated component by 
invoking its show linked components operation from a pop-up menu. 

Generating components in one model from components in another is 
achieved by drag-and-drop. The designer simply drags a component from 
one model and drops it at the desired location in the target model. For 
example, the designer may construct a partial task hierarchy corresponding 
to some constructed presentation (or vice versa). Once the new structure has 
been generated, the relationships between components are maintained 
through the services provided by the Teallach store. 

The final step takes the three completed and inter-connected models and 
using these for the generation of Java source code to present the application 
using the desired interface. The final code will consist of: calls to Swing, 
calls to the DM classes and a set of calls mirroring the task structure. 
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4.2 Presentation and the Domain Model 

As described above, there are two ways in which the DM can be used to 
determine the Presentation Model: by linking DM components to existing 
interaction objects; or by generating presentation objects from DM 
components. The former is used if the developer has created an interface 
and wants to link it to the database. The second is used if the database 
structure is used to generate the interface directly. 

There are several ways in which the DM components can be linked to 
Presentation Model components. Among these are: 

• associating an interaction object with a DM operation - e.g., a 
button might be placed on the interface to summon an operation; 

• associating an object type with a container e.g. a dialogue box - in 
which case, dialogue box content will be subsequently added; and 

• associating an atomic DM component with a primitive interaction 
object -for instance, linking a string property with a text field. 

There are also several ways in which DM components can be used to 
generate Presentation Model components. Among these are: 

• dropping an operation onto a presentation object type will generate 
an instance capable of invoking the operation; 

• dropping a class onto a complex presentation type will generate a 
default structure; and 

• dropping a primitive object type, such as a string, onto the 
presentation generates a new interaction object, such as a text box. 

In fact, the developer can interleave fragments of the linking and 
generating processes if so required. 

4.3 Tasks and the Domain Model 

The DM can also be used by linking domain information to previously 
defined tasks, or as a basis for generating task descriptions. 

The former may be used to: 

• link a class to a compound task, which, for instance, edits objects of 
that class; or 

• link an operation with an action task which is a leaf node of the task 
hierarchy. 

The latter is used to: 

• generate a new leaf task corresponding to an operation; or 

• generate a new compound task corresponding to the editing of an 
object. 

These possibilities are exactly equivalent to the connections between 
Domain and Presentation Models. In this case either the whole Task Model 
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is laid out first and elements of it are then linked to the DM, or elements of 
the Task Model are generated as default activities relating the DM 
components. 



5. CONCLUSIONS 

The paper has described a novel use for conceptual modelling which 
brings together two similar technologies that have typically been kept quite 
separate. On the one hand, the world of Model Based User Interface 
Development Environments is concerned with providing just as much 
application description as is necessary to complete a user interface 
description. On the other hand, the world of conceptual data modelling 
promises a similarly high-level description, but one from which an 
implementation data structure can be inferred. Both kinds of model are high 
level, since they provide intuitive descriptions, but are implementation- 
oriented since implementations must be developed from them. 

It is the high-level nature of both of these models which brings them 
together as the DM of the Teallach system. In Teallach, the DM is extracted 
from the schema of an existing Object-Oriented Database. The object- 
orientation implies the existence of a higher level description than is found 
in classical database systems. Moreover, having restricted our attention to 
Java databases, we have tools for discovering database structure and 
transforming it into a consistent internal structure. 

Having achieved such an internal structure, we use it as the basis of 
interface development, by adding two other declarative models - one for 
presentation and one for the task structure. These can either be developed 
completely separately and then linked to the domain description, or can be 
partially developed from the domain information and subsequently enhanced 
and completed. 

We have turned the usual conceptual methodology around - performing a 
kind of reverse engineering. Instead of generating a low-level description 
from a high level model, we have created a high-level model from an 
installed database and then used it as the basis of application development. 
The installed database, being object-oriented, contains enough self- 
description to allow a complete representation to be abstracted from it. 
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Abstract The talk describes methods for similarity search and for rule discovery in multi- 
media databases. 

Keywords: Data Mining, Multimedia, Indexing 



The talk focuses on the following two problems: Given a collection of tradi- 
tional or multimedia records, (a) find records that are similar to a given record 
(b) find "interesting" patterns within the collection. Thus, the talk is divided in 
two parts. 

The first part examines the indexing problem. The typical example query is: 
"given a collection of stock-price movements, find stocks that move like IBM". 
We assume that a distance function is given by a domain expert. Our goal is 
to find quickly the desirable records. We describe the GEMINI methodology, 
which is typically used for such environments: The idea is to extract a few 
(say, n) numerical features out of every multimedia record, map it to a point in 
n-d space, and then use off-the-shelf spatial access methods for fast insertion 
and search. We give the condition under which the above method guarantees 
no false dismissals, and we describe several settings where it performed well 
(time sequences, color images, medical images etc). 

Moreover, we examine user-friendly ways of posing queries. Suppose that 
the user wants to find stock-price movements that have the ’head and shoul- 
ders’ pattern. There is no straightforward extension to SQL that could handle 
such queries. We present ’MindReader’, a system which allows the user to 
specify multiple examples of desirable objects; then, MindRear studies these 
objects, figures out the common characteristics and tries to second-guess what 
the user really wants. 
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The second part of the talk describes tools to find patterns in a collection 
of multimedia records. The first such tool is ’FastMap’, which automatically 
extracts n features from each record, trying to preserve the distances. This is 
useful both for indexing, as well as for visual data mining: the N multimedia 
objects are mapped into points in a low dimensionality space, where clusters 
and other regularities can be found visually. We show how FastMap was ap- 
plied to text documents, to time sequences, and to video clips. 

Finally, we discuss the powerful Singular Value Decomposition (SVD): 
When the input records are vectors and their distance is the Euclidean distance, 
the optimal dimensionality reduction method is SVD. We show how SVD can 
be used for visual data mining, for compression and for rule detection, on mul- 
tiple real datasets. 




Part VI 



Image Similarity Retrieval 





11 



EFFICIENT IMAGE RETRIEVAL 
BY EXAMPLES 



Roberto Brunelli 

ITC-irst 

Via Sommarive 18, 1-38050 Povo (TN),' ITALY 
brunelli@itc.it 



Ornella Mich 

ITC-irst 

Via Sommarive 18, 1-38050 Povo (TN), ITALY 
mich@itc.it 



Abstract A currently relevant research field in information sciences is the manage- 
ment of non-traditional distributed multimedia databases. Two related 
key issues are achieving an efficient content-based query by example re- 
trieval and a fast response time. This paper presents the architecture 
of a distributed image retrieval system which provides novel solutions 
to these key issues. In particular, a way to quantify the effectiveness of 
low level visual descriptors in database query tasks is presented. The 
results are then used to improve the system response time, an important 
issue when querying very large databases. A new mechanism to sim- 
plify user queries, featuring local modification of the comparison metric 
in the space of image descriptors, is presented and discussed. 

Keywords: image similarity retrieval, feature extraction, image-based queries, dis- 
tributed databases, clustering. 

1. INTRODUCTION 

The current ever growing amount of multimedia data requires a big 
integrated effort in the research fields of Computer Vision, Information 
Retrieval and Database Management for its effective management. In 
particular, retrieving information from multimedia repositories requires 
the development of techniques to supplement traditional methods based 
on textual descriptions and searches. The reason for this necessity is 
twofold: associating textual descriptions to multimedia data can be very 
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expensive, and, what is even more important, textual descriptions may 
not characterize data adequately for subsequent retrieval. The latter 
issue is of particular relevance for multimedia material, whose searching 
criteria and features are highly dependent on user goals. 

An attempt to overcome this limitation is through query by example 
where non textual queries are formulated by the user using multimedia 
items related to the material he/she is looking for (e.g. images or video 
clips for searching footage). Recently many multimedia retrieval sys- 
tems have investigated the query by example framework. Some of the 
most relevant are the IBM’s Query By Image Content system 

( Flickner et ah, 1995), Excalibur Visual RetrievalWare^^^ a comprehen- 
sive application development software to provide content-based, high- 
performance retrieval for multiple types of digital visual media. Visual 
Information Retrieval (VIR) Image Engine by Virage, a set of libraries 
for analyzing and comparing the visual content of images, MARS: Multi- 
media Analysis and Retrieval System (Porkaew et ah, 1999), an applica- 
tion developed by the Beckman Institute and Department of Computer 
Science at the University of Illinois, whose aim is to integrate various 
techniques in the fields of Image Processing and Information Retrieval 
into an Image Data Base Management System that is accessible from 
the web. 

In the query by example framework, the user formulates a query by 
providing examples of objects similar to the one he/she wishes to re- 
trieve. The system converts them into an internal representation used 
for assessing their similarity to the items stored in the database to be 
searched. The main advantage of query by example is that the user 
is not required to provide an explicit description of the items which is 
instead computed by the system. In order for this paradigm to be effec- 
tive, good content descriptions must be computed automatically by the 
system and ways to compare them obtaining results in accordance with 
human judgements should also be available. 

This paper discusses the use of pattern analysis techniques, such as 
density estimation and clustering, and multidimensional scaling, for the 
development of a computer assisted image search system. The archi- 
tecture of the system is described in Section 2. The issue of small yet 
effective image descriptors is considered in Section 3, while relevance 
feedback and query optimization are described in Section 4 and Sec- 
tion 5 respectively. Some concluding remarks are reported in Section 6. 
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2. SYSTEM ARCHITECTURE 

The structure of an image retrieval system to support the query by 
example paradigm for multiple distributed databases is presented in Fig- 
ure 1. The system is configured as a client-server architecture in which 
a client application can submit a user query to multiple image servers. 
The answers from multiple image servers are then merged and proposed 
to the user as a single result. 

Following the query by example paradigm, users rely on the images 
themselves to formulate queries. A generic image X is characterized as 
a triple (/, F,M) whose elements represent a complete description of 
the image pixels /, possibly indirectly by pointing to the corresponding 
memory storage, a derived feature description F = automatically 

computed by the system, and associated meta data M providing infor- 
mation on image contents. Derived image descriptions can be computed 
directly by the client application while meta information is not usually 
provided automatically. 

A query by example Q is defined by giving a set E of images and, 
possibly, by selecting a subset f oi F and a comparison strategy S to 
be used by the image servers when comparing the query images to those 
stored in the database: 

Q=^{EJ,S) ( 1 ) 

The query images can be 

■ local or remote images accessible from the client application: the 
user may provide appropriate meta-data which can be used to 
supplement the visual similarity search with a more traditional 
textual search; 

■ images from a previous query considered as relevant by the user; 

■ images from selected image servers, relying on the browsing func- 
tionalities of the client. 

In order to answer a query, the image server compares the images in the 
query set E to the stored ones using strategy 5, obtaining a dissimilarity 
score for each of them. The dissimilarity of images could be computed 
using both the derived descriptors / and image meta-data M. Derived 
decriptors are often represented as numerical vectors while meta data 
are usually in textual form. The analysis presented in this paper will 
be limited to the use of derived descriptors represented as numerical 
vectors, leaving out any available meta data in the computation of image 
similarity. 
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As the set of query images must be compared to other images, a 
function to compute the dissimilarity of an image from an image set must 
be introduced. If we restrict to metric spaces for the derived descriptors, 
the distance between an image I and an image set E can be computed 
using the following formula: 

D{X,E) = min d(X,X') (2) 

veE 

where d represent the distance defined in the metric space. 

The effectiveness of feature comparison is improved by the use of 
relevance feedback which modifies the distance of the metric space using 
information derived from the interaction of the user with the system. 
The servers then sort database items by increasing dissimilarity. The 
set A of the top ranked ones is returned to the client together with 
their dissimilarity value and, if available and requested, associated meta- 
data. The client, upon receiving the answer from each server, sorts the 
resulting complete set by dissimilarity and offers to the user a single 
answer. 

3- IMAGE DESCRIPTION 

One of the key issues in querying image databases by similarity is 
the choice of appropriate image descriptors and corresponding similarity 
measures. In a recent paper (Brunelli and Mich, 1999) the problem of 
quantifying the effectiveness of several low level visual descriptors was 
addressed. The proposed solution relies on the following definitions: 

Definition 1 Given an n-dimensional histogram space TL and a dissim- 
ilarity measure^ d on TL, the capacity curve C of TL is defined as the 
density distribution of the dissimilarity between the two elements of all 
possible histogram couples within TL. 

Histogram capacity curves provide a basis on which the effectiveness, 
i.e. the discrimination ability, of different image descriptors can be com- 
pared. The shape of C{t) is an indicator of the distribution of histograms 
in TL with the topology induced by the selected comparison dissimilarity 
measure. If the average value of dissimilarity is low, histograms are not 
sparse enough in TL and histogram indexing is not effective. This can be 
formalized by the following definition: 

Definition 2 The indexing effectiveness S of an histogram space TL is 
given by the average dissimilarity value: 

£ = j yC{y)dy 



(3) 
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Figure 1 A general architecture for an image retrieval system based on the query by 
example paradigm. The shaded blocks are considered in detail by the current paper. 



The indexing effectiveness £ can be used to assess the performance 
several descriptor-dissimilarity combinations for image retrieval appli- 
cations. Several ways to compute image dissimilarity were considered 
in (Brunelli and Mich, 1999): Kolmogorov-Smirnov, Kuiper, and 

Lp norms. The L\ norm provided the best overall results in terms of 
indexing effectiveness and stability with respect to the number of his- 
togram bins used. A further benefit of using the L\ norm is that it can 
be efficiently computed using parallel instructions available on current 






1 50 VISUAL DATABASE SYSTEMS 



personal computers. Usage of these instructions results in a four-times 
increase in the speed at which image distances can be computed, making 
the comparison of a million images per second a feasible task. The main 
findings of (Brunelli and Mich, 1999) on the discrimination ability of the 
following basic image descriptors: 

■ hue, H\ a scalar descriptor which associates to an (r^g^h) triple 
representing the pixel color its tint; the resulting density represents 
a circular variable; 

■ luminance, /: a scalar descriptor which associates to an (r, 6) 

triple representing the pixel color the normalized sum of its com- 
ponents; 

■ edgeness, E\ the magnitude of the gradient yj {dxlY + {dylY where 
/ represents the image luminance; 

■ hue co-occurrence: space S is partitioned into couples of pixels 
by means of a binary spatial relation: a pixel located at (x, y) is 
associated to a pixel at {x Ax^ y + A^) and the tints of the two 
pixels are used as indices in a 2-dimensional histogram; 

■ luminance co-occurrence: the same as hue co-occurrence using 
pixel luminances. 

are summarized in Table 1 and Figure 2. Reported data are based on 
two different image sets: 

■ VIDEO: a set of 40000 frames from nine different video clips. The 
video material was varied, ranging from comics, news, to docu- 
mentaries and action movies. 

■ STILLS: a set of 3500 still images from a commercial collection, 
providing more colorful and high quality images than the average 
video material of the above database. 

The effectiveness of the different descriptors can also be used to op- 
timize the order in which they are compared. In image retrieval tasks, 
a threshold on the minimum acceptable similarity is usually imposed to 
limit the number of retrieved items. The computation of image dissim- 
ilarity using the L\ norm can be stopped as soon as its monotonically 
increasing value exceeds the retrieval threshold. When multiple his- 
tograms are used to characterize an image, they can be concatenated in 
many different ways to obtain a single numerical vector describing the 
image. However the order used does matter. Comparing the descriptors 




Efficient Image Retrieval by Examples 151 



sorted by decreasing effectiveness is expected to increase the computa- 
tional savings associated to the use of a retrieval threhsold. Experiments 
on the same data used in (Brunelli and Mich, 1999) are reported in Fig- 
ure 3 and confirm this expectation. 

Table 1 The effectiveness S of some low-level visual descriptors. 



Descriptor 


^VIDEO 


^^STILLS 


Co-occurrence (hue) 


57 


68 


Hue 


55 


70 


Co-occurrence (lum) 


52 


50 


Luminance 


43 


46 


Edgeness 


22 


32 



4. QUERY BY EXAMPLES WITH 
RELEVANCE FEEDBACK 

Relevance feedback is a fundamental mechanism by which system re- 
sponse can be improved by using information fed by the user (Cox et ah, 
1996; Ishikawa and Faloutsos, 1998; Sclaroff et ah, 1997; Porkaew et ah, 
1999). Whenever the system presents to the user a set of images con- 
sidered to be similar to the provided examples, the user can pick among 
them the images he/she considers most relevant to the submitted query 
and add them to the original query. The resulting extended set Er can 
be used to improve system response in a variety of ways (Porkaew et ah, 
1999). A common approach to the implementation of relevance feedback 
for a system using image descriptors in numerical form is that of feature 
weighting and is based on the vector model used for textual documents. 

Image derived descriptors F are obtained by binning, with the 

same number of bins, the density estimates of the corresponding image 
characteristics (e.g. luminance, hue, etc.). Exploiting the homogeneity 
of descriptors normalization and dimensionality, the dissimilarity of two 
images can be computed by: 

= ( 4 ) 

i 3 

where i represents the i-th descriptor and j the value of the j-th bin 
of the descriptor. This distance introduces a metric structure in the 
derived descriptors space and can be used to compute the distance of 
the query set Er from each database item using the formula reported 
in Eq. 2. 
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Image descriptors capacity curves 




Normalized distance 



Figure 2 The plots report the capacity curves, computed using the L\ distance and 
64 bins per histogram, for the image descriptors described in the text. 



The default set is {1, . . . , 1} and can be modified by the user to 
assign different weights to the image descriptors, possibly excluding some 
of them from the computation of d. The set {wij} is computed by the 
system and is used to incorporate relevance feedback into the comparison 
metric. Relevant images should be similar to each other for some of 
the components of their descriptors Fij. This means that the standard 
deviations aij computed over set Er should be small for the components 
capturing the similarity of the images and larger for the components 
which are not relevant. A method to emphasize distances along the 
relevant directions is to use the following set of weights (Rui et ah, 
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Effect of descriptors ordering on retrievoi efficiency 




Figure 3 The plot reports (in double logarithmic scale) the expected gain in speed 
resulting from properly sequencing the image descriptors before comparing them. 
Note the significant advantage over the worst case, where the order in which the 
descriptors are used is inversely proportional to their capacity. 



1997): 



Wij = k 



^ij 



( 5 ) 



In this paper a family of weighting schemes is derived from the previous 
equation 

Wij = ^/3 — (6) 

CTij 



kp — ^ normalizing factor, while /3 is a parameter which 

modulates the weighting effect and can be varied to optimize image 
comparison results. The effect of feature weighting on the computation 
of distances in descriptor space is to increase the value of distances along 
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the directions of minimal variance of the query set. There are some major 
drawbacks to the use of Equation 6: 

■ the use of aij tacitly assumes that the images in the query represent 
a compact set with ellipsoidal shape; 

■ the comparison metric is modified in the same way all over the 
descriptor space; 

Furthermore, the amount of weighting specified by (3 is expected to be 
query dependent and should be optimized on a case-by-case basis. A 
way to overcome these drawbacks is presented in the following section. 

5. QUERY OPTIMIZATION 

The effect of the drawbacks associated to the use of Equation 6 on 
the effectiveness and efficiency of relevance feedback can be minimized 
by determining whether the specified query set Q while not being com- 
pact itself, is composed by two or more compact sets: the query could 
then be split into simpler subqueries, each of them better suited to the 
use of Equation 6. Let us note that this splitting also introduces local 
modification in the metric structure of the descriptors space. 

The cloud of points representing the query images in the descriptor 
space may exhibit local grouping, i.e. clusters, suggesting the splitting of 
the original query set into multiple subsets, each of them characterized 
by the images belonging to one of the clusters. 

From a data analysis perspective, the relevant issue is whether the 
structure of the point distribution supports the presence of multiple 
clusters or not. There are no completely satisfactory methods to deter- 
mine the number of clusters for any type of cluster analysis (Milligan 
and Cooper, 1985; Jain and Moreau, 1987). The situation analysed by 
the current paper presents additional difficulties due to the small num- 
ber of images used to define the query: no asymptotic results can be 
used, and methods relying on density estimates can not be applied. The 
chosen strategy is based on two steps: 

1 establish whether the original query should be split or not; 

2 if the original query should be split determine the number of clus- 
ters into which it should be split. 

The first step is based on the use of a statistic originally proposed by 
Duda and Hart (Duda and Hart, 1973). Let us denote with d(X, X') the 
distance between the descriptors of two images X,X' and with J(c) the 
clustering criterion function for c clusters Ci, . . . , Cc' 
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c 



z=i xeCi 



( 7 ) 



where is the central image of the z-th cluster. The quantity J(c) is a 
random variable whose average value decreases monotonically with c. In 
particular, if data are organized into c compact, well separated clusters, 
the value of J(c) is expected to decrease rapidly until c — c, and much 
more slowly thereafter. Knowledge of the distribution of J(2)/J(l) un- 
der the null hypothesis that all samples belong to a single cluster forms 
the basis for a test to reject or accept the null hypothesis. Unfortu- 
nately, analytical results are often not available. An approximate result 
is derived in (Duda and Hart, 1973) when the distance used in the com- 
parison is the Euclidean norm. As the comparison metric considered in 
this paper is the L\ norm for which results are^much harder to obtain, a 
Monte Carlo approach was chosen (Dubes, 1987). As detailed in Section 
3, each image is represented by histograms of several low level visual 
features, normalized to unit. In order to determine the distribution of 
J = J(2)/J(l) for different sample sizes n (from 6 to 16), 10000 random 
samples were generated that satisfied the image descriptors constraints: 
number of features, number of bins, and normalization to unit. For each 
random sample the Linde-Buzo-Gray clustering algorithm (Linde et ah, 
1980) using the L\ metric was applied 10 times to find the optimal two 
cluster partition. The corresponding values of J were then used to com- 
pute the required distributions which are summarized in Figure 4. As 
anticipated, the distributions for different values of n are markedly dif- 
ferent, n being too small to ensure an asymptotic regime. Given a set 
of N query images the value of J is computed: if the null hypothesis 
of a single cluster can be rejected with the prescribed confidence, the 
appropriate number of clusters should then be determined. The most 
appropriate number of subqueries into which the original query should 
be split is determined by the so called silhouette coefficient S introduced 
in (Kaufman and Rousseeuw, 1990). Let us introduce the following 
quantities: 

= — E 

A{i,c) = Ey2d{ii,ij) 

b(i) = min A(i,C) 

C^C{i) 
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J(2)/J(l) Statistic for LI clustering 




Figure 4 The plot reports the distribution of the J(2)/J(l) statistic for L\ clustering 
using sample points generated taking into account the characteristics of the image 
descriptors used in reported experiments. 



where nc is the number of elements in cluster (7; the silhouette of element 
i is then defined as 

max{a( 2 ), o(z)j 

When a cluster contains a single object, s{i) = 0. The higher the value 
of 5(z) the stronger the membership of i to its corresponding cluster. 
Elements that can not be clearly assigned to any cluster have a silhouette 
value near to zero. The silhouette coefficient S is then defined as 



5 = 



1 ^ 
N ^ 



Si 



i=l 



( 9 ) 
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The value of S is bound to the closed interval [—1,1]: the higher the 
value the better the overall classification of data for the given cluster- 
ing. Furthermore, 5 is a dimensionless quantity that does not change 
when the distances between samples are multiplied by a constant factor. 
The knowledge of the silhouette coefficient can be used to choose an 
appropriate number of clusters k so that 

S{k) = max S{k) (10) 

k=2,...,K ^ 

The above computations are used to subdivide the original query im- 
ages into several, simpler queries, each of which is better conditioned 
for the application of relevance feedback mechanisms (see Figure 5 and 
Figure 6) . The resulting simplified queries are then submitted to the 
image databases. For each simplified query a new comparison metric is 
computed according to Eq. 6. As a result, the metric used for image com- 
parison is no longer a uniform modification of the original Li distance: 
each subquery locally modifies the comparison metric, overcoming one 
of the limitations of the original feature weighting approach. 

As previously noted, the value of /3 used in Eq. 6 can be tuned to 
each query to increase the effectiveness of image retrieval. This can be 
done using information derived from the interaction of the user with 
the system. During his/her interaction with the system the user selects 
some of the images presented by the system as relevant to the current 
query and adds them to the query set. At each interaction, the system 
returns an image set A with the N database images most similar to 
the submitted query. The rank of the relevant images in the sorted lists 
returned by the system can be used to determine an optimal value of l3. 
Let us restrict to a discrete set {/3j} of possible values. For each value /3j 
several synthetic queries can be performed on A by excluding for each of 
the queries one of the images of Ep which belongs to one of the queried 
databases. Due to the way the distance of each database image from Ep 
is computed, this ensures that the excluded image belongs to A (as its 
distance from the query set is zero). The average rank of the excluded 
images over the synthetic queries is then used for the optimization: the 
value of p providing the lowest average rank defines the comparison 
metric. Some examples are reported in Fig. 7. 

6. CONCLUSION 

In this paper an architecture for a general image retrieval system fea- 
turing relevance feedback was presented and discussed. Several low level 
image descriptors (hue, luminance, etc.) have been compared and their 
retrieval effectiveness assessed through their capacity curves. The re- 
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Splitting criteria 




2 3 4 5 

Nr. of clusters 



Figure 5 The upper figure presents a complex query-by-example with 11 images. 
The value of the J2/J1 ratio suggests that the original query should be split, while 
the computed silhouette coefficients suggest that the optimal number of clusters is 
two. The resulting sub-queries are identified by a different shading of the frames. 
Note that the 8 images grouped in the first sub-query exhibit significant variation, 
yet the algorithm has correctly grouped them. 
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Splitting criteria 




Nr. of clusters 



Figure 6 Another example of a complex query that would be split by the presented 
algorithm. The value of the J2/J1 ratio suggests that the original query should be 
split, while the computed silhouette coefficients suggest that the optimal number of 
clusters is three. The resulting sub-queries are identified by a different shading of the 
frames. 
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Optimal choice of p 




Weight exponent 



Figure 1 The plot reports the average rank distribution for different values of j3 from 
two sample cases. 



suits permit the design of scalable image retrieval systems which make 
optimal use of computational and storage resources. A novel approach 
to relevance feedback has also been presented. In particular, the pos- 
sibility of tuning search strategies and comparison metrics to varying 
user behaviour was investigated and novel solutions were presented us- 
ing pattern analysis techniques. 

Notes 

1. In this context, a dissimilarity measure is a bounded, positive, and symmetric function 
defined over a subset of R" x R" . 
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Abstract Similarity-based retrieval of images is an important task in many 
image database applications. A major class of users’ requests requires 
retrieving those images in the database that are spatially similar to the 
query image. We propose an approach for computing the orientation 
spatial similarity between two symbolic images in this paper. The 
proposed approach is not only rotation invariant, but also captures the 
relative distance and orientation range between objects. 

Keywords: Augmented orientation Similarity retrieval Pictorial database 



1. INTRODUCTION 

In many applications (Huang, 1996; Lee, 1989; Papadias, 1994), images 
comprise the vast majority of acquired and processed data. An image of a 
picture is often stored and represented in two forms, the original full-sized 
image and its symbolic form (Chang, 1987). The symbolic version of the 
image is an object-based image file that consists of icons that represent the 
objects in the picture and features extracted from the original picture. So, 
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when a user queries a pictorial database, the retrieval procedure will match 
the query against the symbolic database first. If the features of some 
symbolic database pictures satisfy the query selection criteria, the real 
images are then retrieved and displayed. 

The effectiveness of this kind of image database retrieval depends on the 
correctness and types of image feature representation. Basically, there are 
two kinds of features of an image: visual features (Ooi, 1998) and spatial 
relationship features (Nabil, 1996). Visual features, such as texture, shape, 
and color, could potentially be used as a basis for coarse-grained image 
similarity retrieval (e.g.. Find images which are predominantly orange). If 
we extracted visual features for individual objects within pictures, we could 
ask more fine-grained queries (e.g.. Find images containing a red ball). 
However, it is not until we introduce spatial relationship (Ang, 1998; Nabil, 
1995; Sistla, 1995) into queries that we can ask very precise queries such as 
‘'Find images containing a red ball on the top of a yellow cube.'' While a 
general-purpose image query will most likely involve both visual and spatial 
relationship feature, in this paper we concentrate only on the later. 

So far, there are two kinds of representation for spatial relationship 
features: 

(1) . Topological relationships which are invariant under topological 
transformation of the reference objects (Egenhofer, 1991). 

(2) . Orientation relationships which concern partial and total orientation 
relationships among objects. Ang(1998) and Peuquet(1987) have developed 
several different models. 

Topological relationship has been intensively studied in these years. In 
particular, it was applied in GIS due to its invariance under transformation. 
Egenhofer(1991) proposed a model called 4-Intersection to formally define 
the topological relationships which stay invariant under topological 
transformation such as translation, rotation, and scaling. According to this 
model, each object is represented in 2D space as a point set which has an 
interior and a boundary. Eight meaningful relationships can be defined in 
this model, namely disjoint(dt), touch(to), overlap(ov), covers(co), 
inside(in), coveredby(cb), contains(ct), and equal(eq). He has also 
subsequently extended his 4-intersection model to 9 intersections. This 
method has been popularly accepted for the representation of topological 
relationships. 

However, it is hard to reach a consensus on the representation of 
orientation relationships as the variance of transformation was thought to be 
the intrinsic attribute of orientation. In this paper, we will concentrate on 
orientation relationship and propose an approach that may represent and 
retrieve orientation relationship invariant under transformation(Petraglia, 
1993; Petraglia, 1994). 
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The remainder of this paper is structured as follows: Section 2 discusses 
orientation relationship. Section 3 proposes the Augmented Orientation 
Spatial Relationship (AOSR) representation. Section 4 discusses AOSR 
similarity retrieval. Section 5 describes our experiments for the proposed 
similarity approach. The paper is concluded in Section 6. 



2. ORIENTATION SPATIAL RELATIONSHIP 

Orientation relationships describe where objects are placed relative to 
one another. Three elements are needed to establish an orientation: a primary 
object, a reference object and a frame of reference. In pictorial database, 
orientation relationship information is more useful than topological 
relationships. To generate a description of a picture based on the objects’ 
relationships automatically, a possible assumption is that all objects in the 
picture are topologically disjoint. This is practical because non-disjoint 
objects are hard to recognise. 

As we know, orientation must be given with respect to a reference frame, 
i.e., the orientation that determines the direction in which the primary object 
is located in relation to the reference object. In a picture in which ‘'The ball 
is in front of the caT\ the reference frame is based on the car and its front is 
clearly defined. There are three types of reference frames in the use of 
projective spatial prepositions in natural language: intrinsic, extrinsic, and 
deictic(Hemandez, 1994): 

(1) . Intrinsic orientation: The orientation is given by some inherent 
property of the reference object (e.g. The ball is in front with respect to the 
car’s front). Criteria for determining the intrinsic orientation of objects and 
places are among others: the characteristic direction of motion or use, the 
side containing perceptual apparatus, the side characteristically oriented 
towards the observer, and the symmetry of object. 

(2) . Extrinsic orientation: The orientation on the reference object is 
imposed by external factors. Relevant factors are the accessibility of the 
reference object, its motion (or that of the observer), other objects in its 
vicinity or the earth gravitation (e.g.. The ball is in front with respect to the 
actual direction of the motion of the car. Thus if the car is moving 
backwards, that direction is considered “front”). 

(3) . Deictic Orientation: The orientation is imposed by the point of view 
from which the reference object is seen (by an observer within the scene or 
from the speaker’s point of view). 

Normally, the reference frame with respect to which the orientation is 
determined can be the combination of the above three different types. 
Among these three reference frames, the intrinsic orientation is the most 
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natural framework of reference for the relative orientations among objects. 
For example, the intrinsic front of a house is usually determined by the main 
entrance of the house, and it is the part of the scene that can be seen by 
looking outward from the main entrance. By using the intrinsic orientation, 
we don’t need to worry about the rotation of the pictures. However, not all 
objects have intrinsic orientations. For example, it is hard to define an 
intrinsic front of a round table or a football. For this kind of objects, their 
self-orientations are not very important to other objects, and hence an 
extrinsic orientation can be applied to them. For the ease of discussion, we 
define these two kinds of objects as follows: 

Definition 1: An object with an intrinsic orientation is a regular 
object. An object without an intrinsic orientation is called a 
non-regular object. 

In the following discussion, we will restrict the images to contain only 
regular objects first. We will touch on how to add non-regular objects into 
the symbolic images later on. For simplicity, all objects considered are 
without holes and segments. 



3. AUGMENTED ORIENTATION SPATIAL 

RELATIONSHIP (AOSR) REPRESENTATION 

Usually orientation categorization is based on the relative positions of the 
centroids of objects. However, for objects with extension, this kind of 
orientation relationship can’t reflect the meaningful information between 
objects. For example, in Fig. 1(a), if we simply say object B is at the 
northeast of object A according the centroids of A and B, then there will be 
no difference between Fig.l (b), (c) and (a). However, from a human’s 
viewpoint, they are different. In Fig. 1(a), if the size of object B changes, the 
orientation range of object B relative to object A will change too. In 
Fig. 1(b), the relative distance between object A and B is different from that 
in Fig. 1(a). The intrinsic orientations of object B in Fig. 1(a) and Fig. 1(c) 
respectively are not the same. Therefore, in the following of this paper, we 
propose an Augmented Orientation Spatial Relationship (AOSR) 
representation to capture this information. 
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Figure 2. Definition of a and p for Or and Op (The arrows represent the intrinsic 

fronts of objects) 




Figure 3. An example (the arrow represents intrinsic front of Op) 



However, usually the begin-bound angle and the end-bound angle will 
change too when the orientation of object Op is changed. An example is 
shown in Fig.4. This will cause ambiguity because the change in either the 
relative distance or the orientation may result in the same AOSR for a 
reference object Or and a primary object Op. The problem can be solved if 
we use intrinsic orientation. For example, with the use of intrinsic 
orientation of the reference object as the reference frame, even if the begin- 
bound angle and the end-bound angle remain the same, the change in the 
intrinsic orientation of the primary object Op in Fig.4 can still be detected. 
This will cause the orientation spatial relationship to change when Op is used 
as reference object. Hence it is very important that the intrinsic orientation is 
used in the AOSR to measure angles. 
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Figure 4. An example (the arrow represents intrinsic front of Op) 

In the above, what we have defined is the AOSR between a primary 
object and a reference object. Given a picture with n different objects, each 
of the n objects can be chosen as a reference object with the rest as primary 
objects. Therefore, we can define the AOSR of a picture as follows: 

Definition 3: For a picture with n different objects, its AOSR 
representation is an n x n matrix where each row is the AOSR 
between the row index object as the reference object and each 
of the n objects as the primary objects. The row index objects 
and the column index objects are lexicographically sorted. 




Figure 5. An example picture P (all arrows represent intrinsic fronts of objects) 

In the above definition, it is obvious that the matrix is built based on all 
the objects involved since each row of the matrix is the set of the orientation 
spatial relationships of all objects using the row index object as the reference 
object. As an example, the AOSR of picture P in Fig.5 is as follow (For the 
ease of illustration, all angles are of integer values): 
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A 

B 

C 



A B C 

NULL (210, 240) (255, 270) 

(-15, 15) NULL (240, 270) 
(45, 95) (210, 235) NULL 



Note that the matrix entry P(B, A) = (-15, 15) instead of (345, 15) as P 
must be bigger than a as stated in definition 2. 



3.2 HANDING NO-REGULAR OBJECT 



In some cases, there is a mixture of regular and non-regular objects. To 
construct each entry of the AOSR matrix, we need to handle the following 
two cases: 

1 . One of the objects concerned is non-regular. 

In this case, we use the regular object’s intrinsic front as the non-regular 
object’s intrinsic front. 

2. Both objects are non-regular objects. 

In this case, a is always 0 and p is the subtended angle of the primary 
object with respect to the centroid of reference object. 

It is very straightforward to use only existing regular objects’ intrinsic 
orientation as the reference frame in the AOSR matrix since non-regular 
object’s self-rotation won’t affect the spatial relationship between objects 
usually. However, for pictures containing only non-regular objects, our 
AOSR is not applicable at this stage. Therefore, we assume all pictures 
discussed in this paper contain at least one regular object. A special case is 
that there is only one regular object in a picture. In this case, the intrinsic 
orientation of the regular object is the only reference orientation frame in the 
picture. 



4. SIMILARITY RETRIEVAL 

There are two kinds of picture retrieval: exact matching and similarity 
retrieval, with the former a special case of the latter. In similarity retrieval, 
we are to retrieve from the pictorial database those pictures that are similar 
to the query picture given by a user. To do this, we must design a real-valued 
function that can measure the similarity of every pair of pictures. For the 
ease of discussion, we assume that all objects involved are distinct in each 
picture. We ignore the situations in which there are multiple instances of the 
same type of object (such as 2 cats etc.) within a picture. It has been 
proved(Tucci, 1991) that the similarity retrieval problem is NP-hard when 
multiplicity of objects is allowed in picture matching. 
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Similarity retrieval is a fuzzy issue. Usually different similarity functions 
are used for different comparison objectives. Since our AOSR matrix 
representation is based on object, our retrieval procedure will also be based 
on object. In fact, this follows human’s retrieval procedure in the real life. 
To compare two pictures, we normally compare them by portions using 
different object as reference object in each comparison. Therefore, our 
object-based representation fits well with the object-based retrieval. 

Without loss of generality, consider two symbolic pictures Q and P with 
Q the query image and P the database image. To define a similarity based on 
object Oi in Q, the first step is to check whether P contains Oi. The second 
step, the corresponding rows and columns indexed by Oi in the matrices of P 
and Q are used to compute the similarity. Following Gudivada’s(1995) 
assumption, the maximum similarity is set to 100. If there are n objects in Q, 
each of the n-1 tuples involving other objects’ orientations with respect to Oi 
contributes a value of 100/(n-l) toward the similarity based on object Oi. 
Suppose O 2 is another object that appears in both P and Q. P(Oi, O 2 ) denotes 
the item indexed by Oi and O 2 in picture P’s AOSR matrix. If we have: 

P(Oi, O 2 ) = (a", 2, P^2) 

P(02, O,) = (a'’2i, P’’2i) 

Q(Oi, O 2 ) = (a‘2,2, p^,2) 

Q(02, 0,) = (a%,, PQ 21 ) 

then the four pairs of angles a'’i 2 and a*^i 2 , P'’i 2 and P'^ 12 , €>['’21 and a*^ 2 i . and 
P^21 and P%i will be used to measure the similarity based on Oi and O 2 . 
When the angular difference between a pair of angles increases from 0 to 
180, the similarity contributed by this pair of angles will decrease from the 
maximum to the minimum. However, when the angle difference between 
one pair of angle increases from 180 to 360, the similarity contributed by 
this pair of angles will increase from the minimum to the maximum. Hence 
we choose the cosine value of the difference for the pair of angle plus 1 to 
ensure the value to be nonnegative. The contribution factor from O 2 to the 
similarity based on Oi is defined as 

S12 = 100 *((cOS(a^l 2 - tX^12)+l) + (C0S(P^12- P^12) + 1 ) 

+ (cos(a^ 2 i - ocSi) + 1) + (cos(P^ 2 i - P^2i)+1) + l)/9(n-l) (1) 

In formula (1), when a^i 2 = a^i 2 , P^ 2 = P^i 2 , oc^ 2 i = oc^ 2 i , P^ 2 i = P^ 2 i, the 
contribution factor is 100/(n-l), where n is the number of objects in the 
query picture. This is the maximum similarity contributed by one object. 
When the difference between the corresponding angles in P and Q is bigger, 
the similarity contribution is smaller. It should be noted that the minimum 
similarity contribution is not 0 but 100/9(n-l) when all the four cosines’ 
values are -1.0. This situation happens when all corresponding angles are 
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different 180 degree. However, if either Oi or O 2 does not appear in P, the 
contribution factor from O 2 is defined as 0. This means that as long as O 2 
appears in P too, the contribution factor from O 2 to S 12 must be bigger than 0, 
which is a very important characteristic of the similarity measure. 

Now we may replace O 2 by any other object appearing in Q. Then Si, the 
total similarity based on object Oi, is the sum of all contributions from all 
other n-1 objects in Q. 

Si = Z Sij (j= 2. . .n, where n is the number of objects in Q) (2) 

The maximum Si is 100 and the minimum Si is 0 when Oi is missing 
from P or all other n-1 objects are missing. The value of Si indicates how 
spatially similar is P to Q based on Oi. 

Suppose the AOSR matrices for Q and P are Mq and Mp respectively. We 
define a similarity function SV: 



SV(Mq,Mp) = (Si,S 2, ,Sn) 

where n is the number of objects in Q (3) 

The SV() uses Mq and Mp as inputs to compute the similarity degrees 
between Q and P based on each of the objects in Mq. 

How to compute the spatial similarity between pictures by using the n 
similarity values produced by SV() is application dependent. Here are two 
possible approaches: 

1. Selecting some dominant objects in the query image to compare 
similarity. For example, in the query to find a house, in the front of which 
there is a temple and a swimming pool both facing this house, the house is 
the dominant object. The relationship between the temple and the swimming 
pool is not mentioned and is assumed to be unimportant. In this case, not all 
values from the SV() are useful. In a real application, it may not be 
necessary to compute all Si to S„. This approach can trim the answer set of 
similar pictures very effectively. It is especially useful for similarity 
retrievals involving only a small number of dominant objects 

2. Comparing the similarity based on the sum of all Si to S„. This approach 
is widely used for the image retrieval without dominant objects. The result is 
a small set of candidate images ranked according to their similarity degrees. 

We implemented both retrieval approaches in the following experiments. 
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5. EXPERIMENTS 

Similarity retrieval is always fuzzy and subjective. It is hard to find a 
widely accepted benchmark for similarity retrieval experiments. In our 
experiment, we try to construct our experimental database to include many 
different pictures organised into several groups. 

There are 647 pictures in our database. Each image contains 10 to 15 
objects. The pictures are divided into 5 groups according to the number of 
objects’ changes (include moving and rotating etc.) with respective to the 
query picture. There are 20, 45, 120, 210, and 252 pictures in group 1, 2, 3, 
4, and 5 respectively. A picture is in group i when there are i objects 
changing their positions with respect to the query image. Those pictures with 
five or more objects changing their positions belong to group 5. 

For the first experiment, we randomly choose one unchanged object as 
dominant object, and calculate for each group the average percentage of 
pictures that the similarity degree is more than 90, 93, 95, 98 and 99 
respectively. The results are shown in Table 1. 

For the second experiment, we calculate all Si to Sn and sum the total to 
get the top 10, the top 20, the top 30, and the top 40 pictures with higher 
similarity and compute the percentage distribution of images that have been 
retrieved from each group. Table 2 shows the results of the experiment. 



Table L Experiment 1 results 



Similarity 


The percentage of pictures retrieved (%) | 


I3Q[|3BHI 


|Q||3Q3Hii 








>90 


100.00 


78.67 


62.33 


48.48 




>93 




65.33 


45.83 


28.95 


18.41 


>95 




55.56 




19.62 


9.52 


>98 


100.00 


38.67 


14.67 


4.67 


1.35 


>99 


100.00 


25.78 


5.33 


0.85 


0.08 



Table 2. Experiment 2 results 



Ranks 


The distribution percentage among groups 












10 


100.00 


0.00 


0.00 


0.00 


0.00 


20 


85.00 


15.00 




0.00 


0.00 


30 




26.67 


6.67 




0.00 


40 




32.50 




2.50 


0.00 



From Table 1 and 2, we find that the percentage of pictures retrieved 
from groups with fewer object changes is always higher. When we relax the 
similarity degree requirement for experiment 1 or rank more pictures for 
experiment 2, there are more and more pictures being retrieved from groups 
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other than the first group. This shows that even when many objects have 
shifted their positions, as long as their extent of changes remain small, they 
will be retrieved as pictures similar to the query pictures, yet they may be 
rejected when other algorithms such as (Nabil, 1996) are being used. 

There are two factors to be considered when we determine whether a 
picture is similar to another. One is the number of objects that have changed 
their spatial relationships, and the other is the extent of these changes. Our 
experiment shows clearly that the number of objects that have changed their 
positions is not the only factor that affects the spatial similarity. Although in 
practice, a picture with many objects changed their positions will be more 
likely to be rejected in similarity retrieval, one can argue that it should be 
accepted if the extent of changes is small. Our experiment results show that 
these factors complement each other. 



6. CONCLUSIONS 

In summary, we have proposed in this paper a similarity retrieval 
approach for the augmented orientation spatial relationship (AOSR) 
representation. Comparing to existing systems, the proposed approach is not 
only rotation invariant, but also captures the relative distance and orientation 
range between objects. It overcomes the ambiguity problems that exist in 
other orientation representations, and is more flexible and applicable. 
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Abstract Searching and managing large archives of visual data, such cts images 
and video, is made hard by the lack of proper integration between the 
visual aspects of the problem (image processing, motion estimation, 
feature extraction...) and its database aspects (defining visual data as 
data types in a database). In this paper, we argue that image database 
languages can be built based on feature algebras, and demonstrate how 
such a feature algebra can be built in the case of one of the most popular 
image characterization techniques: histograms. 

Keywords: Histograms, Databases, Image Features, Algebras, Content Based Image 
Retrieval 



1. INTRODUCTION 

A primary capability of any database system is to provide a user the 
means to create, query and manipulate data in a natural, meaningful 
and expressive way. In traditional database systems (Abiteboul et ah, 
1995), this is accomplished by using well- formed mathematical structures 
(such as sets or trees), and designing a language to create, constrain and 
manipulate data sets associated with these structures. The language, 
based on an algebra or calculus, is designed to express most queries 
that a user is likely to formulate on the mathematical structure for the 
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domains of application. The implicit assumption behind the choice of 
the mathematical structure and the operators of the language is that 
they fit naturally to the way a user would think about the data. 

Unfortunately, in query and data manipulation languages designed for 
visual (image and video) database systems, a balance between meaning- 
fulness and expressive power has been hard to achieve. We believe that 
one primary reason for this difference is that the visual scientist’s focus 
is often targeted toward problem-specific data transformation, while the 
database scientist’s focus is on generic methods to formalize relation- 
ships between data, and to access data from well-structured collections. 
This difference in focus, we contend, has led to an ‘‘expressiveness gap” 
in database languages for visual information systems. We illustrate the 
problem in the following paragraphs. 



Consider a hypothetical database system created by assimilating “best 
practices” from current visual database research. Let us assume that the 
system supports both image and video retrieval. 

We also start with the assumption that each image or video item is 
identified by an id and Id denotes the type of all identifiers in the sys- 
tem. Being a characteristic system, it will typically have a set of global 
visual features {gi}^ computed by a series of image transforms, reducing 
the original image to a collection of numbers, the so-called feature vec- 
tors (Gupta, 1995). These feature vectors represent different perceptual 
or cognitive properties of an image such as color, texture, or camera 
motion properties of videos, and can be designed using sophisticated 
techniques to attain properties like invariance to affine transformations 
and illumination disparities. However, despite the fact that each feature 
vector is a collection^ the user does not usually have access to any value 
“inside” this collection, and has to treat the collection as the instance 
of an opaque data type T?. In fact, most often the user does not even 
have access to the value of the feature. The type has only one binary 
operation o^: T? x — )► R, producing a distance between two instances 

of type T?. Depending on Tf , may be commutative (e.g., when is 
a norm such as L 2 ), but it need not be (Santini and Jain, 1999), and 
the associativity of the operator is not considered important. Since the 
distance operation produces a real number called score^ the database 
system often compares one example image with others in the database 
and produces [Id x B], a rank-ordered list of {id^ score} pairs as the 
result type of a search. 

As the system supports n independent global image features, we can 
assume without loss of generality that these features form a relation. So 
the collection of images and their properties can be viewed as a relational 
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database with tuples having the type Tf x Tf . . . x Id. The system 
offers a composite distance between two images by computing a tuple- 
wise difference. This tuple uses a combination function — (Fagin, 1999) 
calls it a scoring rule — over the tuple of {zd, score}i pairs obtained from 
each of the n features. Thanks to the wide body of recent research on 
rules and formal techniques to express and compute this combination 
function (Fagin and Wimmers, 1999; Fagin, 1999; Adali et ah, 1998; 
Nepal and Ramakrishna, 1999) our hypothetical system will have a rich 
collection of ways to use aggregate ranking functions. 

Thus, given the lack of access to the value or the structure of a feature, 
the system treats individual features as a ‘Tlack box” with very poor 
support. However, it provides a wide variety of utilities when the features 
are put in a relation and when the tuple distance function produces a 
list of scores. 

The situation improves somewhat for local visual features. A local 
visual feature type is defined for an image as a composite type t { x 
E^, where the first component arises from image transformations and 
the second component localizes the feature in a region of the image. For 
local video feature, the composite type is given by x E^ x r, where 
r stands for the time when the feature occurs in the videoh We call 
these locality components the spatial support and the temporal support 
of the feature respectively. Given the composite nature of local visual 
features, the ‘‘support” component of the feature can be projected out. 
This allows our hypothetical system to characterize the spatial support 
region into known spatial data types, such as regions, lines and points. 
Now, the system can provide a rich set of query operations defined on 
spatial and temporal intervals and on binary topological relations that 
can be derived from spatial support data. (Li et ah, 1997; Li et ah, 
1998; Day et ah, 1998). Again we have the same problem that the 
feature part of the data is significantly weak and unexpressive compared 
to the structured portion of the data. 

1.1. CONTRIBUTION OF THE PAPER 

In this paper, we attempt to show that it is possible to reduce this ex- 
pressiveness gap by defining mathematical structures for several feature 
classes in a data and process independent fashion. To this end, we first 
identify a set of properties that such feature algebra will need to satisfy. 
Then we develop a special case of feature algebra by treating histograms 
as a generic mathematical structure. We illustrate that this allows us 
to perform manipulations and express query classes on visual data that 
could not be expressed unless a feature algebra were defined. 
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2. SOME PROPERTIES OF A FEATURE 
ALGEBRA 

We classify features into global^ local and relational A local feature 
is characterized either by an explicit spatial or temporal support, or by 
an implicit encoding of spatial (or temporal) information in a globally 
computed feature (e.g., color correlograms (Huang et ah, 1997). Rela- 
tional features are often represented as graphs having atomic attributes 
or computed features at the nodes and edges. In this section we discuss 
the structural representation of a generic feature that may be global, the 
non-support component of a local property, or embedded in the node of 
a relational attribute. 

Definition 2.1 A feature is a collection C of values, containing arbi- 
trarily nested subcollections S{C) indexed by an algebraic structure E 
defined over E, the set of natural numbers. 

Example 1. Structurally, a vector feature V can be considered to 
be a one-dimensional array, and defined as a set of real values, indexed 
by a list of natural numbers, such that V[i] is the i-th element of V. 

Example 2. Many complex features such as texture are often 

represented by a hierarchical bank of filters applied to an image. For 
example, (Hatipoglu et ah, 1999) describes a dual-tree complex wavelet 
feature tree for texture determination. In this case the Si{C) is formed 
by the coefficients of each individual filter bank C — [JiSfiC). E repre- 
sents an extended tree-structure such that the node of the tree indexes a 
filter bank, and, a list within a node specifies a single coefficient within 
a filter bank. 

While the operators in a specific feature algebra must depend on the 
exact nature of C and E, we can identify a core set of properties that 
most feature algebras will satisfy. One set of operators is governed by the 
domain of feature values (e.g., integer, real), and is outside the algebra 
itself. The algebra would include operators such as the following: 

selectCollection. Chooses a specific collection from a database. This 
can be accomplished by a predicate on some attribute of the col- 
lection such as its name, or cardinality. 

pickCollection. In case of collections containing subcollections, this 
operation selects a subcollection a path-expression from the ‘hoot” 
of the collection. 

nextSubCollection. Applicable for nested collections, this operation 
relies on the premise that the structure E provides a traversal 
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functionality. Thus, given a subcollection this operator selects the 
next subcollection in the traversal order prescribed by E. 

select Element. This is a classical select operation of a database sys- 
tem. 

pickElement. This is a selection by path expression starting from the 
root of a subcollection containing the element. Typically, the path 
expression will be prefixed by the path expression leading up to 
the subcollection. 

nextElement. Similar to the nextSub Collection operation, this opera- 
tor traverses from one element to the following element. 

getElement Value. Given an element, this operator returns its value. 
In some cases this value may be complex. For example, for a 3D 
object it may return the two 3- vector principal curvature directions 
of a surface. 

compareElement. This corresponds to the o- Tf x Tf E men- 
tioned before and produces an element distance. 

makeCollection. The operation creates a new (sub) collection from 
zero or more existing (sub) collections, by possibly applying a func- 
tion / to them. An example of such a function could be creating 
a new histogram by computing a termwise difference of two given 
histograms. 

placeSubCollection. This operation positions a newly constructed sub- 
collection into a specific point in the structure of a larger collection. 
It is based on the premise that the structure E allows a systematic 
traverse and insert functionality. 

From this generic set of operations we now illustrate a concrete in- 
stance of a feature algebra, applied to the case of histograms. 

3. THE HISTOGRAM ALGEBRA 

A histogram is a frequency distribution of one or more variables over 
a set of observations. Without loss of generality, we may state that 
given any measurement function f : R -A where R is the domain 
of definition of an image (usually R C 3?^) a histogram represents the 
probability distribution of the values of /. The goal of our algebra is to 
preserve this probabilistic character of the histogram, instead of treating 
them like an array (Libkin et ah, 1997; Marathe and Salem, 1997), where 
no correlation between different cells of an array can be assumed. 




1 82 VISUAL DATABASE SYSTEMS 



Before describing the histogram data type and the operations on 
it, we will need a few accessory functions and definitions. The data 
types boolean^ integer^ real^ as well as arrays of these types are as- 
sumed. Integers can be used to form ranges^ such as 1 : n. If a is 
an array, a[i] (or, indifferently, a^) is its zth element, and a[l : n] (or 
ai:n) is an array composed of its first n components. Ranges can be 
/c-dimensional, and a range can be assigned to variables of type range^ 
but no other operations are defined on them. The A;-dimensional range 
[dii < Xi < d 2 i^i = l,.../c} will be represented using the notation 
[dii^d 2 i]^ where du and d 2 i are one-dimensional arrays. 

A probability distribution function is a mapping from real numbers 
to the interval (0,1) obeying the laws of probability. 

The generation function gp(\du^d 2 iY -^Pi) generates n real numbers 
in the k-dimensional range [du^d2iY according to the probability distri- 
bution function given by pi. The number of buckets in each dimension 
is specified by nf G N^. 

A bucketing function bn,D,A : [1 : n]^, where D is a. range 

[diii d 2 z]^, and A is a A:-dimensional array of positive numbers, is a func- 
tion that maps the A;-dimensional range [du,d 2 i]^ into the integer range 
[1 : n]^. An element of this range is called a bucket. The semantics of 
the bucketing function is as follow. Let X be a k-dimensional array, and 
Xi such that 

l^^i+l=h, ( 1 ) 

then 

bn,D,A{^) ~ [^ 1 ? ^ 2 ? • • • ? ( 2 ) 

Definition 3.1 A histogram is a mapping H : [1 : U null, 

where k is the dimension of the histogram, ni is the bucket size along 
the i-th dimension of the histogram, and m is the codomain dimension 
of the histogram. 

While by definition the codomain of a histogram is always 1, if two 
distinct histograms HI and H2 have exactly the same dimensions, do- 
main, and bucket sizes, we represent them in a compressed histogram 
with a codomain 2. To see where such a histogram may be used, consider 
a 2D edge-orientation histogram with the two dimensions representing 
the direction of the orientation and the strength of the edge respectively. 
We may now want to perform a smoothing operation along each of the 
dimensions, thereby computing at each cell two values. We represent the 
k-th value in the ij-th cell of the composite 2D histogram H' as H'[i][j][k] 
as if it had an additional dimension. For a simple (non-composite) 2D 
histogram, would produce the value err. 
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An important decision in our algebra is that we strictly enforce dimen- 
sionality of histograms. In linear algebra it is quite common to identify 
a column vector with a matrix with only one column, or a row vector 
with a matrix with only one row. For that matter, it is possible to see 
a vector as an array of any dimensionality in which all dimensions but 
one have only one element. The same identification is possible with his- 
tograms: a single dimensional histogram can be seen as a two (or three, 
or four...)-dimensional histogram with only one bucket in the second di- 
rection. This identification is explicitly prohibited in our model. The 
dimensionality of a histogram is a well-defined attribute irrespective of 
the number of buckets along any dimension. Two histograms are called 
isomorphic iff they have the same dimensionality, codomain dimension- 
ality, bucket dimensionality along every dimension, and if their domains 
coincide. 



Constructors. The following operators build histograms starting 
from other data types. 

Let A be a A;-dimensional array. The operator build{A) constructs 
a A;-dimensional histogram with codomain dimension equal 1 such that 
. . . , in) = A[z‘i, . . . , in]‘ The operator build{v^ • • • , ik) builds a k- 
dimensional histogram with co-domain dimension equal to the dimension 
of the array v, and ii buckets along the ith dimension. The histogram 
is initialized to the map 



HUu.. 




if ji = ii A j2 = i2 A ■ ■ ■ A jk = ik 
otherwise 



( 3 ) 



We will also consider two special operators. The first, Cp, constructs 
a histogram based on a given function. Its semantics is 

Gp{[di,d2f,f,n^) ^ build{gp{[di,d2f , f)) ( 4 ) 

The operator null{ni ^ . . . , n^) builds a A:-dimensional histogram with rii 
buckets along the zth dimension implementing the null mapping. 



Histogram functions. The following functions compute quantities 
related to an histogram without altering their argument. 

dim{H) returns the dimension of the histogram. 

domi{H) returns the domain of the histogram in the i-th dimension. 

Dom{H) is a macro that returns the complete domain of the histogram. 

sizei{H) returns the number of buckets in the histogram for the z-th 
dimension. 
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Size{H) is a macro that returns an array [ni, . . . , n/^] denoting the size 
of all buckets in the histogram. 

val{H{ii ^ . . . , 2 /c)) returns the value of the bucket at the specified index. 

bounds{H{ii^ returns a k-tuple of pairs . . . [u^Jk]] where 

[uj^ Ij] are the upper and lower bounds on the domain of the bucket 
[zi, . . . , ik] for the j-th dimension. 

For conveneience, we also define the predicate eq{H[ii^ • • • , i/c] = const) ^ 
which evaluates to true iff val{H[ii, . . . ^ik]) — const^ and the macro 
TotCount{H) which computes the sum of val{H[ii^ . . . ^ik]) over all 
buckets. 

Same size operators. The following operators combine two his- 
tograms of the same dimension, codomain dimension and bucket size, 
or operate on a histogram without altering its structure. The operators 
are undefined when applied to histograms that differ on any of these 
dimensions. 

The macro norm{H) normalizes a histogram so that, if G = norm{H) 

Y, H{h,...ik) = l (5) 

and, for all the values for which the operation is defined 

The operator + denotes addition of two histograms. Hi + H 2 has the 
following semantics: 

{Hi + H2){ii, . . . ,ik) = Hi{ii , . . . ,z/c) + (7) 

The operator — denotes absolute value diflFerence of two histograms. 
Hi — H 2 has the following semantics: 

{Hi - H2){ii,...,ik) = - H2{ii, ■ ■ ■ ,ik)\ (8) 

These operators are special cases of the general element-wise combi- 
nation operators (•) defined as follows. Let / : x R^ — )> (R“^)^ be 

a symmetric and associative operator. Then, for histograms Hi and H 2 
with m-dimensional co-domain, Hi{f)H 2 has the following semantics: 

{Hi{f)H2){ii, ...,ik) = f{Hi{ii, ik),H2{h, ik)) (9) 

The operator == denotes assignment and has the usual semantics: 

i71[zl , . . . ^ik] = C ^ eq{Hl[il^ . . . ^ik] = C) = true (10) 

where G is a constant. 
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Field operators. These operators combine a histogram and a 

real number. The unary operator • denotes scalar multiplication by a 
positive scalar constant. The operator is not defined for c < 0. Thus 
c • H (abbreviated cH) has the semantics: 



{c- (11) 

Similarly to the previous case, this operator is a special case of the 
general operator (•), defined as follows. Let / : E x E^ (E+)’^ a 
function, then the semantics of c{f)H is 

(c(/)Lf)(zi,...,z,)-/(c,/f(zi,...,^^)) (12) 

Cross Dimensional Operators. These operators change the 

dimensionality or the number of buckets of a histogram. The expand 
operator increases the dimensionality of a histogram. Its signature is 
G = expand{H^n^a)^ where H is a, /c-dimensional histogram, a is a A;- 
dimensional vector of integers such that G [1, n] and i ^ j ^ ai ^ aj, 
and the result is an n-dimensional histogram. 

The formal specification of the operator (see below) is rather involved, 
but its semantics is actually quite simple. For example, consider a two 
dimensional histogram H. Then the operation G = expand{H^3, [3, 1]) 
builds a three dimensional histogram such that the first dimension of H 
becomes the third dimension of G, the second dimension of H becomes 
the first dimension of G, and all other dimensions of G (in this case the 
second dimension) have only one bin. In this case, the relation between 
G and H is: 



sizei{G) 


= size2{H) 


(13) 


size^iG) 


= 1 


(14) 


size2,{G) 


= size\{H) 


(15) 




= 


(16) 



In order to define the behavior of the operator in more general cir- 
cumstances, we need a few auxiliary definitions. Let I = [ii^ .. .i^] G 
be a A:-dimensional index, and J = [ii, • • • ^ an n-dimensional 

index. Let Ua : -> be the transformation defined as 

= { f otWiL 

Consider now the set of n-indices that have a ‘T” in the locations not 
covered by the vector a: = {[ji, • • • ,in] € iV"| = q ^ jq = 1}. 

Then the transformation Ua is an isomorphism between and Xa and 
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therefore invertible on this domain: u ^ > N^. The relation be- 

tween the histograms H and G can then be defined as follows: 

yiela G{I) = H{u-\l)) (18) 



The embed operator substitutes part of a larger histogram with values 
from a smaller histogram with the same dimensionality, starting at a 
location / [ii, . . . , i^] in the larger histogram. The operator signature 
is Q = e/(i7, G), where IT is embedded into G. If for some j we have 
ij < 0 or ij 4- sizej{H) > sizej{G)^ the histogram H will be clipped, 
and only the values with indices within the legal range of G will be used. 
The semantics of the operator is 



(e (i? '^f'^hO<jh-ih<sizeh{H) 
n ) ))\ , jn]) otherwise 

(19) 

where J = [ji, . • • ,jn]- 

The select operator selects those bins of histogram JY, that 

satisfy the predicate 6. Its semantics is given by: 






H[i] iie{H\i\) =true 
null otherwise 



( 20 ) 



The project operator takes a symmetric associative operator 

© : X E^ -A (E"^)^ and uses it to compress the hth dimension of 

the histogram H. The semantics of the operator is the following 

sizeh(H) 

i'^h,®{H)){ii-) . . . , ifi—l-) 'I'h-i-l') • • • 7 ^n) ~ . . . , J, . . . ^in) 

j=l 

^ ( 21 ) 

Next, the traversal operator Tf{H) takes as an argument an index 
transform / : -> N'^ (n > k) and uses it to reduce the dimensionality 

of the histogram H and transform its indices 

{Tf{H)){I)^H{f{I)) (22) 

where I € is an index of the new histogram. As an example, consider 
the two dimensional histogram of Fig. 1 and the index transform defined 
by Table 1, corresponding to the traversal of Fig. l.b. The traversed 
histogram is shown in Fig. l.c. 

Finally, the rebucket operator changes the number of buckets along all 
the dimensions of a histogram re-distributing the data inside a bucket 
according to a given probablity density. Consider, for instance, the one- 
dimensional histogram in Fig. 2, The histogram has three buckets, and 
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Figure 1 The traversal operator. 
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Table 1 




(a) (b) (c) 



Figure 2 An example of re-bucketing 



we want to expand it to four using a uniform underlying probability. 
The rebucket operator works as follows: 

1 Transform the histogram into a statistical sampling using the ex- 
isting buckets and the given probability. In other words: the his- 
togram was obtained by sampling a probability distribution and, 
according to the result, there were two samples in the interval cov- 
ered by the first bucket, three samples in the interval covered by 
the second bucket, and three samples in the interval covered by 
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the third bucket. Inside the buckets, the samples are distributed 
according to the given distribution (uniformly, in this case). 

2 The statistical distribution is re-sampled with the new number of 
buckets, four in this case. 

The operator has the form 6(JT, g,p), where if is a n-dimensional 
histogram, q G N'^ specifies the number of buckets in the final histogram, 
and p is a probability density function p : TL x x — )► [0, 1] that 
specifies the distribution of the samples inside a bucket. Note that p 
can depend on the histogram and the index of the bucket that we are 
expanding. 

4. COMPUTING WITH THE HISTOGRAM 
ALGEBRA 

We now illustrate how the algebra can be used to compute useful 
operations. 



Sum of squares of a histogram. 

sq(H) = S{H{-)H) 



(23) 



where 




S{7ri,+ m 



if dim{H) = 1 
otherwise 



(24) 



The recursion is well defined since for all histograms H = 

dim{H) — 1. 



Computation of the L2 distance between histograms. 

L2(Hl,H2) = ^Jsq{Hl-H2) (25) 



Computation of the Hue histogram from the RGB histogram. 

We assume that the functions h{r^ p, 6), 5(r, p, h) v{r^ p, b) transform an 
r, g, b color into the corresponding hsv color. From this function, given a 
histogram with n bins on each color axis, an index for the corresponding 
hsv color can be computed by the function 



k] 



n 



Lit 

n 



i j k\ ( i j k\ 

n n nj \n n n J 



(26) 



The hues histgram can then be computed as 



/iue(/f) = 7r3,+ (7r2,+ (T,(H))) 



(27) 
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Answering queries. The algebra provides a powerful tool for 

the specification of queries in interactive systems. Due to the greater 
complexity of these examples, we will assume that the histogram algebra 
is expressed in a suitable programming language. In particular, we will 
write the functions using the ML programming language (Ullman, 1997). 
The reasons why ML was chosen are its powerful handling of functions, 
including support for currying. In ML, a function / that takes an integer 
and returns an integer (e.g. fun f (x:int) = x) is a first class object 
of type int -> int. On the other hand, a function defined as fun x y 
= x*y, where all the variables are integers, is a curried function of type 
int -> (int -> int) that is, the function takes an integer value (x) 
and produces a function that takes the value y and returns an integer. 
In other words, the expression f (x) is a function of type int -> int. 

The rest of the ML syntax used in the following is rather intuitive, and 
should make the examples understandable also to readers not familiar 
with ML. 

Example l.“Find all the histograms that behave like the function / 
in the interval [/, J]”. A similarity measure for this query is given by 

- fun D(f , F, I, J) = 
let 

histl = (Ji<x<J ( Gp (Dom(H) , f, Size(H))); 

hist2 = ai<x<J (H) ; 
in 

L 2 (histl “ hist2) 
end; 



Example 2. ‘Tind all peaks of a histogram”. The peaks are to 
be determined using the following rules: first the histogram is filtered 
with a given kernel K = [w-m-> • • • , • • • 5 '^m]^ then the maxima are 

detected. Points adjacents to a peak are not considered peaks. For the 
sake of simplicity, we will only consider one-dimensional histograms. 

We begin with the definition of two functions that shift a histogram 
by an amount /, where I is an index. The first function pads the shifted 
portion with zeros: 

- fun shftKH, I) = 

£i (null (Size (H)), H) 

The second function rotates the histogram, as if it were a periodic func- 
tion 
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- fun shft2(H, I) = 

Ei (null (Size (H) ) , H) + s_imodSize{H) (null (Size (H) ) , H) . 
The filter operator, with kernel K is defined as follows: 



“ fun F(K, m, H) = 
let 

fun Ft(K, m, H) = 
if m = 1 

k[l] * H 
else 

(k[l] * H) + Ft(k[2:m], m-1, shft2(H, 1)) 
in 

Ft(K, 2*m+l, shft2(H, -m)) 
end; 

Note that in this case we use the second shift operator that is, we 
consider the histogram as a periodic function. If this is undesired, the 
histogram can be embedded into a null histogram of size Size{IT) + 2m 
and the first shift operator can be used without losing data. 

The peaks will be returned in the form of a histogram that is zero 
everywhere except in the peak locations, where it has value 1. We will 
use the indicator function 6{a^b) whose value is 1 if a = 6 and zero 
otherwise. The following operator takes a histogram IT and returns a 
histogram with all bins set to zero except where H attains its maximum: 



“ fun M(H) = 
let 

fun f(H)(x) = 7Ti,max(H) 
in 

H {6) Gp (Dom(H), f(H), Size(H)) 
end; 

The following auxiliary function takes an histogram H and a his- 
togram G with the same dimension and bucket dimension as H. For 
every bin in G with a^monzero value creates a three bins in H with 
zero value. In other words, the function creates ‘‘holes” in H of size 3 
corresponding to the values in G. 
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- fun Q(H, G) = 
let 

fun f(x) = 1; 



val 


hi — Gp 


(Dom(H), f, 


Size(H)) 


- G; 




val 


h2 = Gp 


(Dora(H), f, 


Size(H)) 


- shftKG, 


1); 


val 


h3 = Gp 


(Dom(H), f. 


Size(H)) 


- shft(G, 


-1); 



in 

H * (hi + h2 + h3) 
end; 

Finally, the function P, finds the n highest peaks in the histogram 

- fun P(H, n, K, m) = 
let 

fun Pt(H, n) = 
if n = 1 
M(H) 
else 

M(H) + P(Q(H, M(H)), n-1) 
in 

Pt(F(K, m, H), n) 
end; 

Example 3. “Find whether, at a specific point in time t of a video 
sequence, the dominant motion in the image is reversed (change of ap- 
proximately 180 degrees) in less than three frames”. We are looking for 
the event that the motion is reversed sometimes between t and ^ + 3. 

The data are stored in a relational database whose schema comprises 
just one table: 

MOTION{t : mt, h : histogram) (28) 

where t is the time at which the motion is considered, and h is a, his- 
togram counting how many points are moving in a given direction. 

We begin by writing a function that determines whether, given two 
histograms H\ and there is an inversion of motion between the 
two. The function works as follows (see Fig. 3). We first take, for each 
histogram, the highest peak, which is representative of the dominant 
motion, then we add the two histograms to obtain a single histogram 
with two peaks. Using the function P defined above, we can write this 
histogram as 

p(i?i,i)+m2,i) (29) 

We then proceed as illustrated in Fig. 4. The histogram of the two 
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Figure 3 First steps of the determination of motion inversion between two his- 
tograms. 



(a) (b) 



(c) (d) 

Figure 4 Determination of the distance of two peaks of a histogram 

major peaks, represented in Fig. 4. a is shifted until the lowest peak is 
on the first bin (Fig. 4.b). Then, a “gauge” histogram is created using 
a Gaussian function centered at a bin corresponding to the distance at 
which we want to check the presence of a peak (the Gaussian allows us 
to tolerate slight misplacements of the second peak, and to control the 
extent to which these displacements can be accomodated), as in Fig. 4.c. 
Finally, the Gaussian histogram and the peak histogram are multiplied, 
giving us a measure of the presence of a second peak in the desired 
position. The shift function Sh uses the function shft2 defined above: 
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- fun Sh (H) = 
let 

fun f (x) = if X = 0 then 1 else 0; 

val cond = proj(l, max, H ♦ G( Dom(H) , f, Size(H))) != 0 
in 

if cond then H else Sh( shft2(H) ) 
end; 

The following function checks if the histogram has a peak around the 
position y: 



“ fun Pn(H, y) = 
let 

fun f y X = Gauss(x - y, sigma); 

val hist = H * Gp(Dom(H), f (y) , Size(H)) 
in 

^l,max (hist) 
end; 

The function returns a value between 0 and 1 representing the confidence 
that the histogram has a peak in the given position. A threshold can 
be applied if a hard decision is needed. With these functions, we can 
write the function MInv(Hl , H2) that returns a number between 0 and 1 
representing the confidence that a motion inversion takes place between 
the histograms HI and H2: 



- fun MInvCHl, H2) = 
let 

val H = Sh(P(Hl, 1) + P(H2, 1)); 
val y = (dom(H, 1) .up - dom(H, l).lo)/2; 
in 

Pn(H, y) 
end; 

With this function, it is possible to formulate the query in the database 



SELECT r , s 
FROM MOTION 

WHERE ABSCr.t - s.t) <= 3 AND MInv(r.H, s.H) > 0.5 
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5. SUMMARY AND OUTLOOK 

In this paper, we have argued that image database languages can be 
built based upon feature algebras and demonstrated how data manipu- 
lations and queries can be executed with a feature algebra in terms of 
histograms. While in this paper we have not formalized the exact query 
classes expressible in terms of this algebra, the peak detection and mo- 
tion inversion examples, suggest that when embedded in a functional 
language, the algebra can express more complex queries than possible 
with current systems. We plan to investigate the expressiveness proper- 
ties of histogram algebra in the future, and expect that it will provide 
us some insight into the desired properties of a more general feature 
algebra. 

We envision two types in the advantages of feature algebras. The first 
is the flexibility that its operations allow in the description of the relevant 
similarity criteria for image databases. The relevant parts of the image 
features can be singled out at query time, using a high level algebra, 
rather than relying on the hard-coded operations typical of computer 
vision. In this sense, we note that the operations that we have defined for 
histograms apply to many different features which rely on similar data 
structures, e.g. Fourier transforms. The explicit extension of feature 
algebras to general image features is currently under way. The second 
advantage is that a description in terms of a feature algebra can bring 
many techniques developed for database (such as query optimization) 
and that, so far, have been difficult to extend to image and video data. 

Notes 

1. Some video operations such as joint color and motion-based object segmentation can 
provide features with both spatial and temporal support. 
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QUERY-BY-TRACE: 

VISUAL PREDICATE SPECIFICATION IN 
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Abstract In this paper we propose a visual interface for the specification of predicates to 
be used in queries on spatio-temporal databases. The approach is based on a 
visual specification method for temporally changing spatial situations. This 
extends existing concepts for visual spatial query languages, which are only 
capable of querying static spatial situations. We outline a preliminary user inter- 
face that supports the specification on an intuitive and easily manageable level, 
and we describe the design of the underlying visual language. The visual nota- 
tion can be used directly as a visual query interface to spatio-temporal data- 
bases, or it can provide predicate specifications that can be integrated into 
textual query languages leading to heterogeneous languages. 

Keywords Spatio-Temporal Queries, Visual Predicate Specification, Visual Database 
Interface 



1. INTRODUCTION 

Spatio-temporal databases deal with spatial objects that change over time (for 
example, they move or they grow): cars, planes, people, animals, ..., storms, 
lakes, forests, etc. Hence, database systems, in particular, spatial and temporal 
database systems, and geographical information systems (GIS) need to be 
extended to handle this kind of information. Of particular interest is, of 
course, the development of simple but powerful query languages that allow 
one to ask for changes in spatial relationships, for instance: “Has a tornado 
ever crossed Iowa?” or “Which planes were able to avoid a certain blizzard?”. 
A formal foundation for these kinds of queries is given by spatio-temporal 
predicates (Erwig et al., 1999e). Whereas it is possible to identify a relatively 
small set of spatial predicates (Egenhofer et al., 1991), it is almost impossible 
to do so in the spatio-temporal case, simply because there are too many of 
them. Thus, there is a very strong need for a simple way of specifying spatio- 
temporal situations, and a visual notation can be extremely helpful here. 
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We will propose a visual language for spatio-temporal predicates. The 
main idea is to represent a spatio-temporal object (such as a car or a storm) in 
a two-dimensional way by its trace. The intersections of such a trace with 
another object’s trace is interpreted and translated into a sequence of predi- 
cates, called development, that can then be used, for example, to query spatio- 
temporal databases. This interpretation is described in (Erwig et al., 1999d). 

The described visual notation can be employed in several ways. One appli- 
cation is, as already mentioned, to realize a visual query interface to spatio- 
temporal databases. But we can also use pictures of this language as specifica- 
tions for (complex) spatio-temporal predicates, which can then be used in 
arbitrary query languages. One interesting possibility is to use a well-accepted 
textual query language like SQL, extend it by spatio-temporal objects and 
predicates (Erwig et al., 1999b), and use pictures to represent predicates in 
WHERE clauses. This leads then to a heterogeneous visual language (Erwig 
et al., 1995). 

The paper is structured as follows: after commenting on related work in 
the next section, we demonstrate in Section 3 as an application of visual 
development specifications a visual query interface to a spatio-temporal data- 
base. In Section 4 we describe how spatio-temporal data can be modeled. In 
particular, we explain the notions of spatio-temporal objects, predicates, and 
developments. In Section 5 we then explain and motivate the design of our 
visual notation for developments. Finally, conclusions are given in Section 6. 



2. RELATED WORK 

The similarity of spatial and temporal phenomena has been recognized for a 
long time in the literature. Both phenomena deal with “spaces” or “dimen- 
sions” of some kind and are thus closely related. Recently, research efforts 
have led both in spatial and in temporal data modeling to an increased interest 
in integrating both directions into a common research branch called spatio- 
temporal data modeling and in constructing spatio-temporal data bases. 
Their underlying basic entities are called spatio-temporal objects and are 
ubiquitous in everyday life. Consider the flight of an airplane, the migration of 
whales, the raging of a storm, or the spreading of a fire region. Characteristic 
features of all these objects are that they are spatial entities changing over 
time and that these changes are continuous. Changes refer, for example, to 
motion, shrinking, growing, shape transformation, splitting, merging, disap- 
pearing, or reappearing of spatio-temporal objects. In particular, the capabil- 
ity of incorporating continuous change of spatial objects over time belongs to 
the most challenging requirements of spatio-temporal data models. 

In the meanwhile, some data models for spatio-temporal databases have 
already been proposed. In (Worboys, 1994) a spatial data model has been gen- 
eralized to become spatio-temporal: spatio-temporal objects are defined as so- 
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called spatio-bitemporal complexes whose spatial features are described by 
simplicial complexes and whose temporal features are given by bitemporal 
elements attached to all components of simplicial complexes. On the other 
hand, temporal data models have been generalized to become spatio-temporal 
and include variants of Gadia’s temporal model (Gadia et al., 1993) which are 
described in (Cheng et al., 1994, Bohlen et al., 1998). The main drawback of 
all these approaches is that ultimately they are incapable of modeling contin- 
uous changes of spatial objects over time. 

Our approach to dealing with spatio-temporal data takes a more integrated 
view of space and time and includes the treatment of continuous spatial 
changes. It introduces the concept of spatio-temporal data types (Erwig et al., 
1998a, 1999a). These data types are designed as abstract data types whose 
values can be integrated as complex entities into databases (Stonebraker, 
1986) and whose definition and integration into databases is independent of a 
particular DBMS data model. 

The definition of a temporal object (Erwig et al., 1998b) in general is moti- 
vated by the observation that anything that changes over time can be 
expressed as a function over time. A temporal version of an object of type a is 
then given by a function from time to a. Spatio-temporal objects are regarded 
as special instances of temporal objects where a is a spatial data type like 
point or region. A point (representing an airplane, for instance) that changes 
its location in the Euclidean plane over time is called a moving point. Simi- 
larly, a temporally changing region (representing a fire area, for instance) is a 
region that can move and/or grow/shrink and whose components can split or 
merge. We call such an object an evolving region. 

Similar to our approach, in (Yeh et al., 1993, 1995) based on the work in 
(Segev et al., 1993) so-called behavioral time sequences are introduced. Each 
element of such a sequence contains a geometric value, a date, and a behav- 
ioral function, the latter describing the evolution between two consecutive 
elements of the sequence. Whereas this approach mainly focuses on represen- 
tational issues and advocates the three-dimensional object view of spatio-tem- 
poral objects, we are particularly interested in an algebraic model of general 
spatio-temporal data types including a comprehensive collection of spatio- 
temporal operations. Nevertheless, behavioral time sequences could be used 
as representations for our temporal objects. 

Temporal changes of spatial objects induce modifications of their mutual 
topological relationships over time. For example, at one time two spatio-tem- 
poral objects might be disjoint whereas some time later they might intersect. 
These modifications usually proceed continuously over time but can, of 
course, also have a discrete property. We already have devised and formally 
defined a concept for such spatio-temporal relationships which are described 
by so-called spatio-temporal predicates (Erwig et al., 1999e). We call a 
sequence of spatial and spatio-temporal predicates a development. 
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Since we are dealing with predicates, it is not surprising that logic-based 
approaches are related to our work. Allen (Allen, 1984) defines a predicate 
Holds(p,i) which asserts that a property p is true during a time interval i. Gal- 
ton (Galton, 1995) has extended Allen’s approach to the treatment of tempo- 
rally changing two-dimensional topological relationships. Topological predi- 
cates are taken from the RCC model (Cui et al., 1993) which comes to similar 
results as Egenhofer’s 9-intersection model which is briefly discussed below. 
In contrast to these approaches, we have pursued a hybrid approach taking 
into account elements from temporal logic and elements from point set theory 
and point set topology. The main reason for not taking a purely logic approach 
is the intended integration of spatio-temporal objects and predicates into spa- 
tio-temporal databases and query languages. These require concrete represen- 
tations for spatio-temporal objects and besides predicates the possibility of 
constructing new objects through spatio-temporal operations. Hence, effi- 
ciency, in particular, for the evaluation of spatio-temporal queries, is indis- 
pensable. 

Our work on spatio-temporal predicates is based on Egenhofer’s 9-inter- 
section model (Egenhofer et al., 1991) for topological predicates between 
spatial objects in two-dimensional space. The goal of this model is to provide 
a canonical collection of topological relationships for each combination of 
spatial types. The model rests on the nine possible intersections of boundary, 
interior, and exterior of a spatial object with the corresponding parts of 
another object. Each intersection is then tested with regard to the topologi- 
cally invariant criteria of emptiness and non-emptiness. From the total num- 
ber of possible topological constellations only a certain subset makes sense 
depending on the combination of spatial objects just considered. For two 
regions, eight meaningful constellations have been identified which lead to 
the eight predicates called equal, disjoint, coveredBy, covers, intersect, meet, 
inside, and contains. For a point and a region we obtain the three predicates 
disjoint, meet, and inside. For two points we get the two predicates disjoint 
and meet (which corresponds to equality). For each group all predicates are 
mutually exclusive. They are also complete in the sense that they cover all 
possible topological constellations under the assumptions of the 9-intersec- 
tion model. 

There exist several approaches to visual query languages for spatial data- 
bases, for example, (Aufaure-Portier, 1995, Calcinelli et al., 1994, Egenhofer, 
1996, Lee et al., 1995). Common to all these approaches is that they allow to 
query only static spatial situations, that is, they can express queries like 
“Retrieve all airports in Ohio”. A characteristic of the involved objects “air- 
port” and “Ohio” is that these objects rarely change their location and/or 
extent. There are also a few approaches to querying image sequences (Arndt 
et al., 1989, Del Bimbo et al., 1995, Walter et al., 1992). However, the goal of 
these proposals is mainly to facilitate queries on video databases and not the 
querying of spatial (or spatio-temporal) databases. Since video data is largely 
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unstructured (just a sequence of images), all these approaches have to be con- 
cerned with additional symbolic representations for the stored images to 
enable queries. Our visual notation is translated into sequences of predicates 
that can be directly checked for the database representation of the spatio-tem- 
poral objects. A short, preliminary proposal for the visualization idea that is 
developed in this paper has been presented in (Erwig et al., 1999c). 



3. QUERY-BY-TRACE 

The following scenario illustrates how our visual notation can be employed 
for querying developments in spatio-temporal databases. We give a rough out- 
line of the interaction a user may perform when visually specifying queries. 
Our goal is to interactively and graphically produce a sketch from which a 
spatio-temporal predicate can be derived. 

The user interface Query-By-Trace (QBT) allows a comfortable specifica- 
tion of developments. It incorporates an editor component to draw specifica- 
tions. The horizontal dimension is the x-axis; the vertical dimension describes 
time. The top of the editor provides two menus, one for moving points and 
one for evolving regions. Assuming a relational setting, both menus show the 
available attributes related to spatio-temporal objects in the database together 
with the corresponding relation names in brackets. In our example we use an 
environmental database containing weather and flight information. Assume 
that a user asks for all flights crossing hurricanes. The user selects from the 
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Figure 1 Selecting First Object 
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menu Evolving Regions the attribute extent of the relation hurricanes (see 
Figure 1) and clicks at a desired position on the canvas of the editor. The 
result of this action is a circle labeled with the name of the selected relation 
(see Figure 2). 




Two things are striking. First, we can observe that the circle and the verti- 
cal line, respectively, are static. Since we are going to investigate the topolog- 
ical relationships between two moving objects, it is not decisive whether both 
objects or only one object moves due to the independence of metric and dis- 
tance properties. It is only necessary that one object moves to be able to 
describe and visualize the process of the temporal evolution of the spatial 
relationships between the objects. Second, we do not need to model the real 
extent, shape, and location of an evolving region and the exact location of a 
moving point over time. We can abstract from these aspects since we are only 
interested in specifying topological relationships, which is a task for which 
we do not need any metric information. 

Depending on the next selection, the kind of query is determined: if 
another region is chosen, a development between two regions is being speci- 
fied; otherwise a point/region development is going to be sketched. If instead 
of a moving region a moving point were selected at the beginning, the user is 
only allowed to select another point, and a development between two moving 
points would be specified. 

In our example the user now selects from the menu Moving Points the 
attribute route of the relation flights. This indicates that the user is interested 
in specifying the development between an evolving region and a moving 
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point. The second selection always creates a point or a circle that can be 
moved over the canvas. The user next draws a crossing situation which 
requires the following interactions: a click at a desired position outside the 
circle produces the starting point. The system determines the initial relation- 
ship of this point with respect to the evolving region and displays the spatial 
predicate disjoint in the two message lines at the bottom of the editor. Now the 
user drags the mouse from bottom to top from the starting point towards the 
circle. Note that during the specification process it is not possible to drag the 
mouse cursor below the current position since a spatial object cannot move 
backward in time. As soon as the mouse cursor leaves the starting point, the 
name of the spatio-temporal predicate Disjoint is added to the message lines. 
The fact that a spatial predicate is constant for a certain period is registered in 
the message lines by a spatio-temporal predicate indicated by an initial capital 
letter (for example, Disjoint, Meet). 

We distinguish between the raw mode and the normalized mode of a 
development specification. The raw mode corresponds to the original defini- 
tion of a development as an alternating sequence of spatial and spatio-tempo- 
ral predicates. The normalized mode introduces simplifications to make the 
specification more readable for the user. One of these simplifications is that a 
spatial predicate (like disjoint) followed or preceded by its corresponding spa- 
tio-temporal predicate (like Disjoint) can be abbreviated to the spatio-tempo- 
ral predicate. Hence, in the second message line we only see the Disjoint 
predicate so far, see Figure 3. 




Figure 3 Raw and Normalized Modes 
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While moving the mouse, the system draws the trace of the point and steadily 
watches for possible changes in the topological relationship. Each change is 
recorded in the message lines. The user now continues to move the cursor 
towards the circle and then traverses it. So far the user has specified an Enter 
situation, that is, the moving point at some time has met the circle and has 
been inside the circle since then, see Figure 4. Afterwards the user drags the 
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Figure 4 Dragging Sample Objects 



mouse to an end point outside the circle and releases the mouse button. The 
final picture is shown in Figure 5. 

If the second selection of a moving object is also an evolving region, the 
development between two evolving regions shall be specified. The user can in 
this case move a second circle which is smaller than the first one. The trace 
consists of two disjoint curves spanning a corridor which the moved circle 
traverses while being dragged from a start position to its end position. The 
two predicates meet and coveredBy (describing the situations when the circles 
touch externally, respectively, internally) are called instant predicates since 
only they can be valid at an instant. They can, of course, also be valid for 
some period {Meet, CoveredBy). To distinguish these two cases interactively, 
the drawing of Meet and CoveredBy is supported by holding down the shift- 
key during dragging. The movement of the mouse is then restricted to go 
along the border of the constant object until the shift-key is released again. At 
the end of a dragging transaction, both the visual specification and the spatio- 
temporal predicate sequence are immediately available. 
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Figure 5 Final QBT-Specification of the Cross Predicate 



An example for two moving points is given in (Erwig et al., 1999d). We 
believe that this user interface is intuitive and easy to use because the user acts 
(via the mouse) as a moving object that behaves exactly in the way as the 
drawn spatio-temporal predicate demands it. In other words, the user action 
precisely conforms to, or satisfies, the specification that is drawn. 



4. SPATIO-TEMPORAL OBJECTS, PREDICATES, 

AND DEVELOPMENTS 

In this section we will review some of the formal foundations and sketch our 
definition of spatio-temporal objects (Section 4.1), our concept of spatio-tem- 
poral predicates (Section 4.2), and our specification mechanism for spatio- 
temporal developments (Section 4.3). 

4.1. SPATIO-TEMPORAL OBJECTS 

One of our design goals is to define a spatio-temporal data model that is inde- 
pendent of a specific DBMS data model. This is achieved by encapsulating 
spatio-temporal data types into abstract data types which comprise a compre- 
hensive collection of operations and predicates. Assuming a relational setting, 
for instance, we can then embed spatio-temporal data types in the same way 
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like types for integers, reals, booleans, or strings as attribute types in a rela- 
tion, that is, the relation has only a container function to store attribute data in 
tuples. 

The design of our model for spatio-temporal data is as follows: for com- 
patibility with smoothly changing spatio-temporal objects we choose a con- 
tinuous model of time, that is, time = IR . The temporal version of a value of 
type a that changes over time can be modeled as a temporal function of type 

T(a) = time a 

We have used temporal functions as the basis of an algebraic data model for 
spatio-temporal data types (Erwig et al., 1998a, 1999a) where a is assigned a 
spatial data type like point or region. For example, a point that changes its 
location over time is an element of type x(point) and is called a moving point. 
Similarly, an element of type %{region) is a region that can move and/or grow/ 
shrink. It is called an evolving region. Currently, we do not consider a tempo- 
ral version of lines, mainly because there seem to be not many applications of 
moving lines. A reason might be that lines are themselves abstractions or pro- 
jections of movements and thus not the primary entities whose movements 
should be considered. In any case, however, it is principally possible to inte- 
grate moving lines in much the same way as moving points if needed. In addi- 
tion, we also have changing numbers and booleans, which are essential when 
defining operations on temporal objects. For instance, we could be interested 
in computing the (time-dependent) distance of an airplane and a storm. This 
could be achieved by an operation: 

Distance : x(point) x x{region) x{real) 

The example demonstrates the concept of temporal lifting avoiding an infla- 
tion of operation names and definitions: we can, in principle, take almost any 
non-temporal operation (like distance : point x region — > real) and “lift” it so 
that it works on temporal objects returning also a temporal object as a result. 
More precisely, for each function/: x ... x -> (3 its corresponding lifted 

version 

T/:x(ai)x...xi:(a„)-^T(p) 
is defined by: 

T/(5i, 5„) := {(?,/(5i(0, 5„(0)) 1 1 e time] 

Hence, we can derive temporal operations rather automatically. For example, 
we obtain Distance = distance. 
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4.2. SPATIO-TEMPORAL PREDICATES 

Temporal lifting is, of course, also applicable to spatial predicates. Consider 
the spatial predicate 

inside : point X region bool 
The lifted version of this predicate has the type 

T inside : x(point) x x{region) x{bool) 

with the meaning that it yields true for each time at which the point is inside 
the region, undefined whenever the point or the region is undefined, md false 
in all other cases. We see that the lifted version is not a predicate since it 
yields a temporal boolean and not a (flat) boolean what we would expect from 
a predicate. 

Our understanding of spatio-temporal predicates is the following: a spatio- 
temporal predicate is essentially a function that aggregates the values of a spa- 
tial predicate as it evolves over time. Thus, a spatio-temporal predicate is a 
function of type x(a) x x(p) — > bool for a, p e { point, region}. 

If we consider the definition of T inside, we can define two spatio-temporal 
predicates sometimes -inside and always-inside that yield true if '[inside yields 
true at some time, respectively, at all times. Whereas the definition for some- 
times-inside is certainly reasonable, the definition for always-inside is ques- 
tionable since it yields false whenever the point or the region is undefined. 
This is not what we would expect. For example, when the moving point has a 
shorter lifetime than the evolving region and is always inside the region, we 
would expect always-inside to yield true. Actually, we can distinguish differ- 
ent kinds of “forall” quantifications that result from different time intervals 
over which aggregation can be defined to range. In the case of inside the 
expected behavior is obtained if the aggregation ranges over the lifetime of 
the first argument, the moving point. This is not true for all spatial predicates. 
In fact, it depends on the nature and use of each individual predicate. For 
example, two spatio-temporal objects are considered as being always-equal 
only if they are equal on both objects’ lifetimes, that is, the objects must have 
the same lifespans and must be always equal during these. 

In order to be able to concisely build spatio-temporal predicates, we use 
the following general syntax: Q^p-P where Qe {V, 3 }, 0/7 g {n, u, tCj, 712} is 
a function mapping two sets into a new set (tT/ simply takes the iih argument 
set), and /? is a spatial predicate. Such an expression then denotes the spatio- 
temporal predicate: 

52 ). 2 t G op(dom{Si), dom(S2))-p(Si(t), S2(t)) 
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This means that, for example, \! ^^.inside denotes the spatio-temporal predi- 
cate 



\{Sy, S-^N t G dom{Si^dnside{S^{i), 52(0) 

In general, X(xi, ^ 2 , ...),e denotes a function that takes arguments xi, X 2 , ... 
and returns a value determined by the expression e. So the above expression 
denotes a function that takes two arguments Si and ^2 and yields the boolean 
value denoted by the V-expression. 

With this notation we can give the definitions for the spatio-temporal ver- 
sions of the eight basic spatial predicates (for two regions): 



Disjoint 

Meet 

Overlap 

Equal 

Covers 

Contains 

CoveredBy 

Inside 



\/ ^.disjoint 
y^.meet 
\/ ^.overlap 
\f^.equal 
\/^^.covers 
\/ ^^.contains 
\f ^^.coveredBy 
.inside 

Til 



For a moving point and a moving region we have just the three basic predi- 
cates Disjoint, Meet, and Inside, which are defined as above. For two moving 
points we have the basic predicates Disjoint and Meet, which are also defined 
as above. The chosen aggregations are motivated and discussed in detail in 
(Erwig et al., 1999e). 



4.3. DEVELOPMENTS 



Now that we have basic spatio-temporal predicates, the question is how to 
combine them in order to capture the change of spatial situations. That is, the 
issue is how to specify developments. In order to temporally compose differ- 
ent spatio-temporal predicates, we need a way to restrict the temporal scope 
of basic spatio-temporal predicates to specific intervals. This can be obtained 
by predicate constrictions (note that 5|/ denotes the partial function that yields 
S\t) for all r G I and is undefined otherwise): let / be a (half-) open or closed 
interval. Then 

P;:=V5l,52).P(5,|;,52|/). 

Now we can define the composition of predicates as follows: 

P until p then Q := 

52).3 t : p{Sy{t), S2it)) A ^ 2 ) a Si) 
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When we now consider how spatial situations can change over time, we 
observe that certain relationships can be valid only for a period of time and 
not for only a single time point (given that the participating objects do exist 
for a period of time) while other relationships can hold at instants as well as 
on time intervals. Predicates that can hold at time points and intervals are: 
equal, meet, covers, coveredBy; these are called instant predicates. For exam- 
ple, an airplane and a hurricane can meet at a certain instant or for a whole 
period. Predicates that can only hold on intervals are: disjoint, overlap, inside, 
contains', these are called period predicates. For example, it is not possible for 
an airplane to be disjoint from a hurricane only at one point in time; they have 
the inherent property to be disjoint for a period. 

It is now interesting to see that in satisfiable developments instant and 
period predicates always occur in alternating sequence. For example, it is not 
possible that two continuously changing spatio-temporal objects satisfy 
Inside immediately followed by Disjoint. In contrast. Inside first followed by 
meet (or Meet) and then followed by Disjoint can be satisfied. Hence, devel- 
opments are represented by alternating sequences of spatio-temporal predi- 
cates and spatial predicates and are written down by juxtaposition (in this 
paper). A more formal treatment of compound spatio-temporal predicates and 
developments is given in (Erwig et al., 1999e). Our example of a flight run- 
ning into a hurricane can now be formulated as the composition: 

Disjoint until meet then Inside 

Since predicate composition is associative, we can abbreviate nested compo- 
sitions by writing down simply a sequence of the spatio-temporal and spatial 
predicates, that is, we can simply write Disjoint meet Inside for the above 
example. We introduce the name Enter for it to reuse it later. A flight running 
out of a hurricane can be characterized by Leave := Inside meet Disjoint. A 
flight that traverses a hurricane can be described by Disjoint meet Inside meet 
Disjoint using basic spatio-temporal predicates or shorter as Enter Leave 
using derived predicates; we introduce the name Cross for it. Note that spatial 
predicates and their corresponding spatio-temporal predicates (like meet and 
Meet) that occur next to each other in a development can be merged to the 
respective spatio-temporal predicate. We list a few further examples for two 
evolving regions: 

Enter := Disjoint meet Overlap coveredBy Inside 

Leave := Inside coveredBy Overlap meet Disjoint 

Cross := Enter Leave 

Touch := Disjoint meet Disjoint 

Bypass := Disjoint Meet Disjoint 

Graze := Disjoint meet Overlap meet Disjoint 
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In order to assess the expressiveness of our visual notation we can ask which 
developments are possible at all and which developments can be specified by 
our visual language. Possible topological changes or transitions of spatio- 
temporal objects over time can be visualized in so-called development graphs 
whose vertices are labeled either with a spatial, that is, an instant, predicate or 
with a basic spatio-temporal predicate. Hence, each vertex models a time 
point or a time interval in which the corresponding predicate is valid. An edge 
(p, q) represents the transition from a predicate p to a predicate q and stands 
for p q. A path (p^, p 2 , Py^ within the graph describes a possible temporal 
development p\ P 2 Pn of topological relationships between two spatial 
objects. For the point/point and for the point/region case we obtain the follow- 
ing two development graphs: 



Disjoint 




meet Meet 



Disjoint 



meet 



Meet 



Inside 



Starting, for example, with Inside in the point/region case, we obtain seven 
possible development paths not properly containing cycles^: 




Since the development graph is symmetric in this case (each of the four verti- 
ces can be selected as the start vertex of a path), we obtain a total of 28 paths. 
This means, there are 28 distinct temporal evolutions of topological changes 
between a moving point and an evolving region without repetitions. For each 
alternative we could define an own spatio-temporal predicate. In the point/ 
point case we get 13 possible development paths. The development graph for 
the region/region case yields not less than 2198 paths and thus possible pred- 
icates (Erwig et al., 1999e). It is shown in Figure 6. 



1. More precisely, quasi-cycles, see (Erwig et al., 1999e). 
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Disjoint 



meet 



Meet 





There are some constraints imposed by our visual notation which restrict 
the possible development paths that can be expressed by a visual specifica- 
tion; consequently, they lead to a restriction of the development graph. These 
constraints are: (i) the sizes of the static circle and the moved circle are fixed, 
(ii) the static circle is larger than the moved circle, and (iii) our visual notation 
contains an implicit ordering of both circles, that is, the smaller moved circle 
symbolizes always the first argument of a predicate and the larger constant 
circle stands always for its second argument. These constraints lead to the fol- 
lowing restrictions of the development graph. 

First, from the three pairs coveredBy leavers, CoveredBy I Covers, and 
Inside! Contains only one relationship per pair, namely coveredBy, CoveredBy, 
and Inside, can be represented in our visual development specifications. 
Hence, we can remove the vertices covers. Covers, and Contains and their 
incident edges . 

Second, four transitions in the graph, namely from CoveredBy and from 
Inside to equal and Equal, respectively, solely result from a growing or 
shrinking of one object. Since we cannot alter the proportions neither of the 
static circle nor of the moved smaller circle, the vertices equal and Equal can- 
not be reached by CoveredBy and Inside so that we can take away the corre- 
sponding four edge. 

Third, the transitions between Overlap and equal and between Overlap 
and Equal do not require growing or shrinking. But the prerequisite for this 
transition is that the static circle and the moved circle have the same size, and 
Just this is excluded by our visual notation. From an Overlap situation we can 
never come to an equal or an Equal situation so that the two corresponding 
edges must be removed. Because the vertices equal and Equal are isolated 

2. Note that this restriction could be dropped if we would allow that the moved circle can be made larger 
than the constant circle. 
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now, we can remove them, too. We obtain the following final development 
graph shown in Figure 7. In this restricted graph, only 87 different paths not 
containing quasi-cycles are possible. 



Disjoint 




coveredBy CoveredBy 




Inside 

Figure 7 Simplified Development Graphs 

All finite paths that can be obtained by this development graph can be 
specified with our visual language for the region/region case. 



5. VISUAL SPECIFICATIONS OF DEVELOPMENTS 

In this section we give a collection of design decisions that eventually lead to 
a simple and intuitive, yet powerful, two-dimensional visual language. 

The first design decision is essential to obtain an integrated notation for 
spatial and temporal aspects: 

(1) Represent the temporal dimension geometrically. This leads in a first 
step to a three-dimensional model of spatio-temporal objects. 

Now we could stop here and use 3D pictures to specify developments, but 
there are two main reasons for not doing so: first, drawing three-dimensional 
pictures is much more difficult than drawing 2D pictures. In particular, with- 
out specialized user input devices, it can become quite tedious to generate 3D 
drawings with mouse and keyboard. Such a drawing interface is also very 
hard to implement; it must offer many options to the user and is thus again 
more difficult to learn and to apply than a two-dimensional language. Second, 
three-dimensional illustrations of developments are overdetermined in the 
sense that they display (i) growing/shrinking and movement of regions and 
(ii) relative positions of the beginnings and endings of objects’ lifetimes. (The 
first point will be discussed in more detail below.) Such overspecifications are 
generally undesirable since they complicate the understanding of visual nota- 
tions because the user has to sort out much visual information that has no 
meaning for her specification. 
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The second design decision is essentially a step to reduce overdetermina- 
tion: 

(2) Abstract from exact positions/extents, and reduce two-dimensional 
geometric objects to one-dimensional ones. Use the y-axis to represent 
the temporal dimension. 

This essentially means to “forget” about the y-axis with regard to spatial 
information, and to represent a point as a point on the x-axis and a region as an 
j-interval. Thus, the y-axis can capture the temporal aspect of spatio-temporal 
objects so that a moving point is represented by a line and an evolving region 
is represented by a region as shown below: 




This picture describes a moving point that enters a region, then leaves the 
region and finally stops on the region’s border. It is striking that the sketched 
movement/shrinking/growing of the interval representing the evolving region 
does not contribute anything to this specification, that is, it would be as well 
possible to use a plain rectangle representing a stationary /constant region. The 
reason is that we are only specifying topological relationships, and thus we 
need only information about the relative positions of objects with respect to 
each other. In particular, we need not be concerned about the exact position or 
size information of objects.^ 

This leads to the third design decision: 

(3) Represent the evolving region in the definition of a point/region predi- 
cate (respectively, one evolving region in the definition of a region/re- 
gion predicate) simply as a circle. Likewise, represent one of the two 
moving points in a point/point spatio-temporal predicate as a vertical 
line. 

This leads to an easy to understand notation. For instance, the point/region 
predicate Bypass can be specified as shown in Figure 8. 

It remains to be explained how the second evolving region in the specifica- 
tion of a region/region predicate is represented. We do that analogous to the 
representation of moving points: display two objects (showing the moving 
object’s first and last position) connected by a trace specifying the object’s 

3. Actually, this is not the whole truth: for some spatio-temporal developments, growing and shrinking 
is essential, but these cases are rare, and the complexity of an extension of the visual notation would not be 
justified by the relatively small gain in expressiveness, see Section 4. 
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Figure 8 Visual Specification of Bypass 



movement. The initial and the final object are given by two circles (that are 
smaller than the constant circle). The trace is depicted for a moving point by a 
dotted line, and for a moving region we use two dotted lines. For example the 
predicate Graze is drawn as follows: 




The first and the last shown position of the evolving region are disjoint from 
the constant region, and thus they both represent the predicate disjoint. The 
trace represents the sequence Disjoint meet Overlap meet Disjoint. This is 
because the left trace border intersects the constant region in exactly two 
points and the right trace border does not intersect the constant region at all. 
Hence, altogether this picture denotes the predicate Graze. Some variations 
are shown below. 




Figure 9 More Spatio-Temporal Predicates 

Note that the exact interpretation can always be inferred from the intersec- 
tions of the trace borders with the static circle. This is explained in (Erwig et 
al., 1999d). 
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6. CONCLUSIONS 

We have demonstrated how a simple two-dimensional visual language can be 
used to express predicates on spatio-temporal objects. This language can be 
well used as a query interface to spatio-temporal databases. Having a precise 
semantics, the visual notation can also serve as a formal language to commu- 
nicate and reason about spatio-temporal situations in general. 
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Abstract Multiple perspective video taken by more than several tens of cameras 
are observed and stored for applications such as video surveillance and 
outside broadcasting. It is one of the most important demands to grasp 
what happens in the area from the multiple perspective video in a short 
time, but there are some problems to be solved. For example, we cannot 
look at a lot of video simultaneously and it is difficult to understand the 
entire state of a phenomenon that occurs in a broad area and is sparsely 
taken by a plural cameras. In this paper, we propose a new skimming 
method using tempo-spatial importance measures. At first, the video 
importance is calculated using the importance of elements captured in 
the video scene. The elements are objects such as buildings and cars, 
and events such as a temperature rising over and a batter hitting. They 
have their own importance based on space, time and semantics. Then 
some video are selected based on their importance and displayed with 
a map and three-dimensional graphics of the objects for skimming the 
multiple perspective video effectively. We discuss the tempo-spatial im- 
portance of multiple perspective video, the video importance calculation 
and the display methods based on the importance. We also describe our 
prototype and some experimental results. 



Keywords: video database, skimming, multiple perspective video, tempo-spatial 
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1 . INTRODUCTION 

Recently, much attention has been focused on archiving, retrieving 
and delivering of multiple perspective video data because of rapid ad- 
vances in digital video technologies. The multiple perspective video 
means a collection of mutually-synchronized video data taken by multi- 
ple video cameras. There are many applications of the multiple perspec- 
tive video such as a building security surveillance, a plant monitoring, 
and a TV broadcasting of a sport event etc. Video data of a camera are 
digitized and stored in archival devices. In some applications such as 
building security surveillance, not only video data but also their meta- 
data (such as camera position data, sensory device data etc.) can be 
acquired and stored in a synchronized manner. (Tsukada et ah, 1.997) 

In considering applications of those multiple perspective video, one of 
the most important research issues is how to provide a mechanism to 
present the vast volume of video data taken by several tens of cameras 
to users. It seems to be difficult for human beings to see and recog- 
nize the contents of multiple numbers of video taken simultaneously. 
Ramesh Jain et al.(Katkere et ah, 1996) proposed a system that the 
video intervals which capture the walker extracted by image recognition 
techniques can be queried and retrieved from multiple perspective video 
data. However, the system is not available for various applications be- 
cause the query is only for already-known moving objects and ad hoc. In 
addition, it is not impossible to grasp what happens in the area captured 
by a lot of cameras in a short time. 

It is effective for browsing and displaying the multiple perspective 
video to be able to understand intuitively where a video clip or a video 
frame is taken. There are video browsing methods for that purpose that 
utilize camera field data such as camera’s position and direction given by 
some sensors like a GPS (global positioning system) and a remote con- 
trollable camera. Arikawa (Arikawa 1998) proposed a system in which 
the camera icons that represent each video clip have their directions and 
are displayed at their positions in 3D virtual world. Joachim Sauter 
et al. (Sauter et al.,) proposed a way to display each video frame at its 
camera position and in its camera direction. But they are methods for 
a moving camera and not for the multiple perspective video. 

On the other hand, in the area of video database systems, much works 
have been done about ’video skimming’, which is to summarize a whole 
video data based on the ’importance’ of scenes of the video. For ex- 
ample, Kanade et al.(Wactlar et al., 1996) proposed a way to do the 
video skimming by the combination of speech recognition techniques 
and information retrieval techniques. However, most works on the video 




Skimming Multiple Perspective Video 22 1 



skimming have been concerned with skimming a ’single’ video stream 
data. Our attention is focused on the way to do the skimming of mul- 
tiple stream video data that are taken concurrently. In this case, there 
are several temporal or spatial relationships among cameras, and so, we 
propose a way to skim multiple perspective video by considering these 
temporal and/or spatial relationships. 

In this paper, we propose a novel skimming method using tempo- 
spatial importance measures. At first, the video importance is calcu- 
lated using the importance of elements captured in the video scene. The 
elements are objects such as buildings and cars, and events such as a 
temperature rising over and a batter hitting. They have their own im- 
portance based on space, time and semantics. Then some video are 
selected based on their importance and displayed with a map and three- 
dimensional graphics of the objects for skimming the multiple perspec- 
tive video effectively. 

The rest of this paper is organized as follows. Section 2 explains the 
multiple perspective video and its skimming. We discuss the tempo- 
spatial importance of multiple perspective video, the video importance 
calculation and the display methods based on the importance in Section 
3 and Section 4. The prototype for evaluation and some experimental 
results are described in Section 5 and Section 6. Finally, we conclude 
this paper and indicate future work in Section 7. 

2. MULTIPLE PERSPECTIVE VIDEO AND 
ITS SKIMMING 

2.1. MULTIPLE PERSPECTIVE VIDEO 

Multiple perspective video in this paper is live and stored video taken 
by more than several tens of cameras. Applications treating such mul- 
tiple perspective video are video surveillance for building security, plant 
monitoring and super highway facilities management, broadcasting of 
sport and festival, and so on. Multiple perspective video has the follow- 
ing metadata. Those metadata describe ’when’, ’where’, ’why’ the video 
are taken and ’what is captured in the video’. In addition, the metadata 
can be created automatically by various methods such as image sensing 
and speech recognition, and stored with the video data. Therefore the 
metadata are very useful for query and search. 

■ Time stamp : It specifies when the video is taken. 

■ Camera ID and camera field : They specify where the video is 
taken. The camera field consists of its position, direction, focus 
length and so forth. If the camera field and the position of an 
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object are known, it can be calculated by geometry whether the 
object is taken in the camera or not. Then we can know what 
is captured in the video by utilizing the camera field data and 
a geographic information system which manages objects such as 
buildings and facilities on a map. The camera filed can be get 
by some electronic equipment and devise such as a controllable 
camera, a GPS and a gyrocompass. 

■ Event : It specifies what happens in a object such as facilities 
and people. For example, a person intruding at night, two cars 
colliding at a crossing, a batter hitting a home run and so on. 
Event data consists of event ID, event type, time stamp and so 
on. There are various sensing devices on the market that create 
such events, for example, an optical sensor, a temperature sensor 
and a sound sensor. Image sensing technology is also often used 
practically for video monitoring and surveillance, and there are a 
lot of researches on this technology (Brill et ah, 1998). Speech 
recognition technology is available to create some events from a 
telephone speech and a broadcast commentary. 

■ Annotation : It is data which a user annotates to the video for 
report and analysis. It is input as metadata by a keyboard and a 
speech recognition device. 

Multiple perspective video has another feature, tempo-spatial sparse- 
ness. All of the whole area is not perfectly taken by multiple cameras 
but some parts of the whole area are taken sparsely in space as illus- 
trated in figure 1. Moreover, all of the video are not recorded for a long 
time but some parts of the video, especially the important parts such as 
the latest interval and the interval around an event, are only stored. 

A video server for the multiple perspective video such as the literature 
(Tsukada et ah, 1997) captures video taken by a lot of cameras, stores 
the video data and the corresponding metadata, searches the appropriate 
video clips with the metadata, transmits and display them. We focus this 
search process and consider effective skimming methods for the multiple 
perspective video in this paper. 

2.2. PROBLEMS OF ITS SKIMMING AND 
OUR APPROACH 

There are various purposes to see the multiple perspective video such 
as live observation, event analysis and video editing, and we need some 
ways to get and see the necessary parts of video effectively for those 
purposes. It is one of the important demands to grasp what happens in 
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Figure 1 Tempo-Spatial Sparseness of Multiple Perspective Video 



the area captured by a lot of cameras in a short time. For that purpose, 
we need a effective video skimming for multiple perspective video that 
extracts some important parts of video and display them. But there are 
several problems to be solved for such video skimming. 

1 More than several tens of cameras 

A multi-video display method is often used in which multiple video 
pictures are displayed simultaneously on a screen that is divided 
into n X n video windows such as 4 x 4. But we can see only several 
pictures at the same time and it is very difficult to look at precisely 
more than two pictures at the same time. Furthermore some unim- 
portant video on the screen may obstruct to look at the important 
video. It is an essential problem how to select the important parts 
of video taken by more than several tens of cameras. 

2 Intuitive understanding in space 

We cannot intuitively understand where is taken in a video scene 
only by displaying the video picture such as the multi- video display 
method. It is more serious if there are a lot of cameras. Further- 
more it is difficult to understand an event that occurs in a broad 
area and is taken sparsely in space and time by many cameras. 
We are interested in how to display a plural video in which sev- 
eral parts of the whole area are peeped in order to understand the 
entire event well. 

3 Summary of events that occur for a long time 

It is an important issue to summarize video data even for single 
stream video. But there are peculiar problems for multiple stream 
video because those video streams are related each other in time, 
space and semantics. We are also interested in how to summarize 
multiple perspective video and how to structure the summarized 
data that include the video parts and their metadata. 




224 VISUAL DATABASE SYSTEMS 



In this paper, we especially consider the former two problems for the 
applications that have a kind of locality on the importance of each cam- 
era in space and time. The locality means that most of the cameras are 
not important all the time, but the importance of each camera changes 
in turn and the number of cameras to be seen is limited at a time. At 
first in our approach, the importance of video is calculated using the 
importance of elements captured in the video scene. One type of the 
elements is an object such as facilities, buildings, people and cars, and 
another type is an event that is an action of object or a change of its 
state such as someone intruding, a temperature rising over and a bat- 
ter hitting. The elements have their own importance based on space, 
time and semantics as described in section 3. The video importance is 
decided by summing up the importance of elements that is captured in 
the video scene. Then some video are selected based on their impor- 
tance and displayed with a map and three-dimensional graphics of the 
objects for skimming multiple perspective video. Three display methods 
are described in section 4. 

3. TEMPO-SPATIAL IMPORTANCE 

As described before, the importance of video is decided by the impor- 
tance of elements which are captured in the video scene. The impor- 
tance of an element changes dependently on its position in space, the 
time when it is captured, and the relationship among elements. In this 
section, we consider the tempo-spatial importance of the elements and 
we describe the calculation of video importance. 

3,1. TIME-DEPENDENT IMPORTANCE 

We can divide the time-dependent importance broadly into two types. 

1 Importance in a micro-view of time 

An event is not only important at the moment when it occurs but 
also important for some time before and after it occurs. Then it 
has an importance distribution in time which origin is the time 
when it occurs. We can think of various kinds of distribution 
forms dependently on the kind of event and applications. To speak 
generally, an event is more important around the origin and is less 
far from the origin such as a normal distribution in figure 2 (a). 
Figure 2 (b) is available if the state of affairs is important before 
an event occurs. For example, we want to find a cause of the event, 
but the sensor is low sensitive and the cause was taken in the video 
a long time before the event occurred. Figure 2 (c) can be used 
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if the state after an event occurs is important. For example, the 
cause of the event is known in advance and we want to see the 
state caused by the event well. We call this kind of importance as 
importance in a micro- view of time because we pay attention only 
to one event in a micro time interval. 





hI 



(a) Normal Distribution 



(b) Distribution with a Peak before an Event 



w 
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(c) Distribution with Much Importance after an Event 



Figure 2 Examples of Event Importance Distribution 



2 Importance in a macro-view of time 

The event importance may change dependently on time attributes 
and temporal relationship among events as the following. We call 
this kind of importance as importance in a macro-view of time. 

■ Importance depend on time attributes 

The event importance may change with attributes of time 
such as a time period and a season. For example, an event of 
people walking in an office building is important at midnight 
but is not necessary in the daytime for security. Suppose that 
the important distribution wc{t) for an event class C and an 
important coefficient value CA{t) caused by the importance 
depend on time attribute A can be given, the importance 
Wi{t) of an event instance i when it occurs at time is de- 
scribed in the following equation. 

Wi{t) = CA{t) X wc{t - ti) (1) 

■ Importance depend on temporal relationships among events 
The event importance also may change by relationships be- 
tween the event and the other events. For example, when 
many events of a kind occur in a short period, the impor- 
tance of the first and last few events may be higher than 
those of the other events as illustrated in figure 3. The re- 
lationships are various for applications and it is burdensome 
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to define such relationships by hand for each application. We 
want to find the method that extracts some features in any 
group of events and makes use of those features to decide the 
importance depend on temporal relationships among events. 
This is one of our future works. 




Figure 3 Importance Depend on Temporal Relationships among Events 



. 2 . SPACE-DEPENDENT IMPORTANCE 

The space-dependent importance can be also classified into two types. 

1 Importance in a micro-view of space 

An element is not only important at the point where it exists but 
also important around it in space. Then it has an importance dis- 
tribution in space which origin is the place where it exists. Various 
kinds of distribution forms are available dependently on the kind 
of element and applications. As the same as the time-dependent 
importance, an element is more important around the origin and is 
less far from the origin in general. An example of the importance 
distribution in space is illustrated in figure 4 (a). We also call this 
kind of importance as importance in a micro- view of space. 

2 Importance in a macro-view of space 

The importance of an element can change dependently on space 
attributes and spatial relationships among elements as the follow- 
ing. We call this importance as importance in a macro-view of 
space. 

■ Importance depend on space attributes 

The element importance may change with attributes of space 
such as geographical mesh data. For example, a heavy rain 
and flood warning is more important in a region with a soft 
ground. Suppose that the importance distribution wc{x^y) 
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for an element class C and an importance coefficient value Cj\ 
caused by the importance depend on space attribute A can 
be given, the importance Wi[x^y) of an element instance i at 
point [xi^yi] is described in the following equation. 



Wi{x,y) = CA{x,y) X wc{x - x^,y- yi) (2) 



■ Importance depend on spatial relationships among elements 
The element importance also may change by relationships 
between the element and the other elements. For example, 
in a soccer game, a foul event near a goal object is more 
important and an area where some famous players are playing 
closely may be important. It is also our future work to find the 
method that extracts some features in any group of elements 
in space and furthermore in time and space, and makes use 
of those features to decide the importance depend on spatial 
or tempo-spatial relationships among elements. 
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(a) Importance Distribution of Elemaent (b) Capture Degree Distribution of Camera (c) Video Importance 



Figure 4 Importance Calculation in Space 



3.3. CALCULATION OF VIDEO 
IMPORTANCE 

We explain the calculation method of video importance based on 
tempo-spatial importance, especially based on importance in a micro- 
view of time and space. 

1 Calculation in space 

The video importance is calculated based on the elements in the 
video and how well those elements are taken in the video. For 
example, the importance distribution of an element is a form in 
figure 4 (a). A camera also has a spatial distribution as illustrated 
in figure 4 (b) to calculate how well the elements are captured. 
This distribution is decided by camera field such as a direction 
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and a focus length of the camera. We call it as a capture degree. 
We define the video importance for an element as a product of the 
importance of the element and the capture degree of the camera 
as illustrated 4 (c). And the video importance is the sum of all the 
video importance for each element. Suppose that the importance of 
element j is Wj^ the capture degree of camera i is the importance 
Wi of video captured by camera i is described in the following 
equation. 



Wi = ax Wj (3) 

3 



2 Calculation in time 

The video importance calculated in space has to be renewed peri- 
odically because the event importance changes in time as illus- 
trated in figure 2. Then display of multiple perspective video 
changes automatically as the replay of those video are in progress. 
Suppose that the importance of event k at time t is Wk{t — tk) 
where tk is the time when the event occurs and the capture de- 
gree of camera i at time t is the importance Wi{t) of video 

captured by camera i is described in the following equation. 



Wi{t) = ei{t) x'^Wkit - tk) (4) 

k 

A controllable camera and moving camera which camera field changes 
are available with the camera degree ei {t) . 

4. DISPLAY FOR THE VIDEO SKIMMING 

After the video importance is calculated, some video are selected based 
on their importance and displayed with a map and three-dimensional 
graphics of the objects for skimming multiple perspective video effec- 
tively. We explain three display methods in the following. 

4.1. SYNCHRONIZED DISPLAY IN SPACE 

N of the most important video are selected and displayed on a map 
where each video window is placed near its camera field. (See figure 
11) Those video are replayed synchronously based on the time where 
they were taken. Looking at what happens at several places by the 
synchronized replay on a map, it is possible to understand the event more 
accurately in less time. The number N should be altered interactively 
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or automatically by user’s intention, screen layout not to hide the map 
with many video windows, display ability of the machine and so forth. 
Furthermore, the following optional methods are useful. 

■ Enhancement: The video window is displayed with enhancement 
such as its size and color based on the video importance so that 
more important video can be seen more conspicuously. 

■ Quality: The quality of video such as image resolution and frame 
rate is altered based on the video importance so that more im- 
portant video is displayed more clearly and precisely. It is also 
useful for making use of the system resources such as cpu power 
and network bandwidth effectively. 

■ Map operation: A map operation such as scrolling and scaling is 
useful to see a map from various viewpoints, so that the display 
method should correspond to the map operation. Namely, it alters 
the representative video to be displayed according to the display 
area of map. The video importance is independent on the display 
area of map, but there are two methods that the N representatives 
to be displayed are selected from all the video or selected from the 
video in which the display area are taken. 

4.2. TRACEABLE DISPLAY IN SPACE 

In tracking a moving object or observing a propagating phenomenon 
such as smoke in fire, it is possible to see the trace with the synchronized 
display in space if the event importance for such phenomenon are set 
high level. In addition, it is effective to display the trace itself on a map. 
The important video change in turn as the replay is in progress. We 
can see the trace of moving objects by displaying a still image of the 
video that was important in the past and has become unimportant and 
disappeared, (see figure 5) 

4.3. SERIALIZED DISPLAY WITH SPACE 
NAVIGATION 

This display method is useful to look at each important video more 
precisely and understand a three-dimensional structure around the video 
and between them. In this method, each video is displayed by the order 
based on the video importance. The video is also displayed in three- 
dimensional (3D) world that consists of 3D graphic objects around the 
video. After one video is over, we walk through the 3D world to the 
next video as illustrated in figure 6. We have not considered the detail 
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Figure 5 Traceable Display in Space 



of the method yet. The order of video, the display time for video, the 
viewpoint in 3D world and so on can be decided by several things such 
as the place and time of video, the requested time to navigate besides 
the video importance. This is also one of our future works. 




Wilkthroufth 



Figure 6 Serialized Display with Space Navigation 
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5. PROTOTYPE 

We developed a prototype to evaluate some of the methods proposed 
in this paper. The system construction, the implemented functions and 
the software construction are described in the following. 

1 System construction 

The system construction of the prototype is illustrated in figure 
7. Multiple perspective video are taken by multiple cameras and 
recorded on video tapes. Those video data are digitized by a cap- 
ture board attached to a personal computer, encoded and stored 
on to a hard disk drive in an AVI file format. A video editing tool 
on the market is used to edit the multiple video streams such as 
adjusting the start frame of each stream. The metadata are cre- 
ated by hand with a text editor and stored in a text file. When 
the skimming program that we implemented is started, the system 
loads the map data, the metadata and the video data, and the 
system displays them on screen. The skimming program is writ- 
ten by C++ language and runs on a personal computer with dual 
Pentium® II and Windows NT 4.0®. The program uses Windows 
Media^^ Player Control to replay video. 
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Figure 7 System Construction of Prototype 



2 Implemented functions 

The video importance is calculated based on the element impor- 
tance defined in a text file. In addition, we implemented a function 
that makes the importance of video higher near around a mouse 
position on the map. Then the video importance is calculated 
based on the element importance and the mouse position. This 
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function enables a user change the video importance dynamically 
based on the place where a user want to look at closely. It is useful 
to see the neighborhood of the important video or the important 
object. After checking whether each element is in a camera field or 
not, namely clipping, the importance in space is calculated by the 
following equation using only the importance of element existing 
in the camera field. (See figure 8) 



W, = X F, )/(/,, X 5,) (5) 

j 

In this equation, Wj is the importance of element j. Si is the 
size of an imaging device, Fi is the focus length of camera i, lij 
is the distance between camera i and element j. This equation 
approximately represents the size of element that is projected on a 
focal plane. Normal distribution is used for the event importance. 




Figure 8 Clipping Element in Calculation of Video Importance 



Three types of element, object, event and mouse position can be 
selected and each coefficient of those importance can be specified 
interactively for experimenting several variations of the importance 
calculation. The importance is recalculated periodically to cope 
with a change in time of the event importance and the screen is 
renewed. The importance is also calculated in every occurrence 
of the mouse event to cope with the mouse movement. N of the 
most important video are selected and displayed on a map with the 
size and frame color enhancement. (See figure 11) Those video are 
replayed synchronously based on the time where they were taken. 
Several parameters of the video importance calculation such as the 
number of video windows N can be altered interactively while the 
skimming program is executing. The map and the video windows 
on it can be scrolled and scaled. 

3 Software construction 
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The skimming program consists of four threads as illustrated in 
figure 9. The interaction thread is a main thread of the program. 
It reads some data files such as the metadata and creates some 
internal worktables in an initial process. After the initial, it re- 
ceives user’s operations such as replay on the control panel and 
moving mouse on the map window, and sends messages to the 
other threads. For example, it sends a replay message to the dis- 
play thread when a replay button is clicked on the control panel, 
and it sends a video change message to the importance thread 
when a mouse pointer is moved on the map window. The interac- 
tion thread also displays the map. The importance thread receives 
messages from the interaction thread and the event thread, cal- 
culates the video importance and updates the importance table 
that describes the importance and ranking for each video. The 
display thread receives messages from the interaction thread and 
the importance thread, makes a change in the display table that 
describes how to display each video based on the video importance, 
and updates the screen. The event thread monitors the captured 
time of the video being replayed and the occurrence time of event, 
and sends messages to the importance thread if some event exists 
within the certain amount of replay time. The display thread uses 
Media Player Control to display multiple videos synchronously. 
The controls read and replay each video data in accordance with 
given parameters such as a file name and a start time. When the 
display thread starts to replay a new video because the video im- 
portance changes, it gets the current playback time from one of 
the controls that is still replaying a video and gives the playback 
time as a start time to the new control for replaying the video. 
Media Player Control dose not have the exact synchronized replay 
ability, and neither this implementation does. But it is sufficient to 
evaluate the skimming methods displaying several video windows 
simultaneously as described in the section 6. 

6. EXPERIMENTAL RESULT 

With the prototype, we did the two experiments that were intended 
to evaluate our skimming method especially for moving objects. We 
describe the outline of the experiments and the results. 

6.1. EXPERIMENT 1 

With nine video cameras, we sparsely took the multiple perspective 
video of the scene where two toy cars and two toy trains were running on 
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Figure 9 Software Construction of Skimming Program 



rails and they sometimes collided each other. (See figure 10) We defined 
two types of event that were object colliding and object passing. The 
following were found out. 

■ When all nine video streams were displayed at the same time with- 
out any importance as illustrated in figure 11(a), the eyes were 
changed one after another to the video that captured a moving 
toy. We sometimes missed to see a collision because we paid much 
attention to a toy passing. We could observe four video windows 
simultaneously in the neighborhood which area was about lOOcm^ 
on a 21” monitor, without overlooking any event on each video. 

■ When an event importance of collision was set higher than that of 
passage and the four of the most important video were displayed 
with enhancement as illustrated in figure 11(b), we did not pay so 
much attention to a passage that we did not overlook a collision. 
This method has an effect that a user can know events with less 
importance without overlooking events with higher importance. 

■ It was easy to understand how a toy moves around on the map if an 
event importance of passage was set high. But we misunderstood 
the moving direction of the toy if its direction on the map and its 
direction on the video displayed on the map were not matched. 

■ It was desirable to change a video window size slowly because 
changing it rapidly and frequently made the eyes tired. 
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Figure 10 Experiment 1: Synchronized Display in Space with Toy Trains 




(a) Synchronized Display in Space without Importance 




(b) Synchronized Display in Space with Enhancement 



Figure 11 Skimming Examples in Experiment 1 
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6.2. EXPERIMENT 2 

With ten video cameras, we sparsely took the multiple perspective 
video of the scene where some persons were walking in a building floor. 
We supposed two stories. One is that a man was a suspicious person 
and he walked around at night with a few people walking. We defined 
the event that someone passed through a camera. The following were 
found out. 

■ The video that captured the suspicious man were displayed one 
after another as the event importance changed as illustrated in 
figure 12(b), so that we could easily follow him on the map. We 
did not mind so much the video that captured other people even 
if the video were sometimes displayed. The traceable display in 
space has not been implemented yet, but the moving path was left 
in the head enough by this display method. 

■ When he was in the blind spot of a camera for a while, the im- 
portance of the video became less so that the video window was 
disappeared and any other video were not displayed. Some mea- 
sures are necessary to avoid this problem. 

Another story is that the suspicious man walked around in the daytime 
with many people walking. Only the mouse position was used for the 
importance calculation so that we followed him interactively. 

■ We could easily follow him with the mouse operation. This mouse 
operation was very good because the video we wanted to see could 
be displayed only by moving the mouse near around the place. 

■ Inappropriate video were sometimes displayed because the cam- 
era field was decided only by the position, direction and focus of 
camera and the building structure was not considered. 



7. CONCLUSIONS 

In this paper, we introduced the video importance calculated by the 
importance of elements that is taken in the video, and then applied 
the importance to the display methods with a map and 3D world for 
skimming the multiple perspective video effectively. We found the effec- 
tiveness for some of the proposed methods by the experiments described 
in section 6. But those experiments are basic and primitive. We want 
to evaluate them in practical experiments or real applications which are 
bigger scale, have more than several tens of cameras, have complicated 
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Figure 12 Skimming Examples in Experiment 2 



combinations of events and so on. We are also planning to study the 
method that extracts some features in a group of elements and makes 
use of them to decide the tempo-spatial importance, and the serialized 
display method with space navigation. 
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Abstract This paper proposes a framework of spatial hypermedia systems to augment 
and reduce the real world by using networked remote live videos and spatial 
databases. Augmenting the real world means that an information system pro- 
vides some additional information with a live video. On the other hand, reducing 
the real world means providing less information for some parts in a live video. 
The reduced reality is important for privacy of networked live videos. This paper 
describes the principle of realizing the spatial hypermedia systems and presents 
our experimental networked system Name-at. 

Keywords: Augmented Reality, Hypermedia, Internet, Video Streaming, Privacy 

1. INTRODUCTION 

Recently, rich environments of multimedia data have gradually become avail- 
able. Because of high-speed networks, we can deal with videos or voices on 
Internet. These enable us to participate in video conferencing or distance learn- 
ing and so on. To enhance our participation, we have developed what we refer 
to as augmented hypermedia, i.e., we augment hypermedia with videos which 
we can obtain by using a remote controllable camera. For instance, when we 
watch a live video by means of computers, the video displays annotations of 
objects which exist in the live video. The annotations consist of virtual objects. 
We can treat our real space as hypermedia when we add anchors to the virtual 
objects. In this paper, we call such systems spatial hypermedia. 
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In order to annotate live videos, image recognition is considered important 
to position virtual objects, but it is generally difficult to archive real-time and 
precise positioning by image recognition. This paper proposes a new frame- 
work of spatial hypermedia that augments or reduces real space information 
based on remote live videos and spatial databases. Remote live videos can be 
annotated without image recognition by means of the framework. The frame- 
work also enables us to maintain our privacy by forbidding users to control a 
remote camera in certain conditions or by editing the video. To demonstrate 
that our framework is useful, we have developed an experimental application. 

2. FUNCTIONS OF SPATIAL HYPERMEDIA 

The following are functions of spatial hypermedia that we propose: 

■ Annotation: Positioning Text Characters on Live Videos 

The text characters represent the annotations of objects in the live video 
(Figure 1). Annotations run after real objects in the video whenever the 
direction and the zooming ratio of the camera are changed. This is the 
most important function of the application. 






Augmented Reality 



Figure 1 Augmented reality as composition of a live video and text characters. 
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■ Entry and Deletion of Information in an Augmented Real Space 

Spatial databases can be changed through the live video. In order to add 
a piece of new data to the database, we create a new annotation in the 
live video. Deletion of annotations means that the data are deleted from 
the database. 

■ Control of the Remote Camera 

A user can control the direction and the zooming ratio of a remote cam- 
era, and direct it toward his intended objects by selecting an entry from 
a visual menu. 

■ Clickable Augmented Real Space 

An annotation is clickable for associating with its corresponding WWW 
page. This function turns the real world into a kind of hypermedia. 

■ Levels of Detail (LoD) 

In the case of zooming in on some objects using a video camera, we can 
obtain more detail of the object as additional CG data composed of a 
live video. On the other hand, CG objects are not displayed with a lower 
zooming ratio. LoD controls the quality of graphical representations of 
CG objects. 

■ Protecting Privacy for Remote Live Videos 

Insuring privacy of remote live videos is an important issue on Internet. 
The technique of placing text characters in a video can be applied to 
protecting privacy of remote live videos by means of replacing a partial 
image on the video. For example, a building in a live video is replaced 
by a low resolution image or its past image (Figure 2). 

Figure 3 shows the basic configuration of spatial hypermedia. The real- 
world area represented by the video frame can be calculated by the data re- 
garding the condition of the camera, e.g., its direction and zooming ratio. In- 
formation about objects in the video is retrieved with the camera condition 
data. Finally, a user can obtain augmented reality, which consists of a real 
video and CG text characters. 

3. HOW TO REALIZE FUNDAMENTAL FUNCTIONS 
3.1. POSITIONING TEXT CHARACTERS 

The positions of text characters for annotations can be calculated with cam- 
era parameters and databases. When a user registers a new annotation into the 
video, both the (x, y) position of the annotation on the display screen and the 
camera parameters are stored in the databases. 
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Figure 2 An example of reduced reality for privacy. 
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Figure 3 Basic configuration of spatial hypermedia. 
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3.1.1 Changing the Zooming Ratio. In the case of zooming in or out, 
objects in the live video move in the video. Annotations should be drawn at 
the proper positions with changes in the zooming ratio of the camera. Figure 
4 represents the videos with different zooming ratios a and b. A dotted line 
represents the center of a video’s width. Suppose that the distances between 
the object (the gray rectangle) and the center of the video’s width (the dotted 
line) are s and t, respectively, and the half width of the video is 1. The zooming 
ratio of the video (i) is a, (ii) is b, and a < b. 



1 




Figure 4 Movement of an object in changing the zooming ratio. 



Figure 5 shows two relations between the video camera and the object, and 
they correspond to (i) and (ii) respectively in Figure 4. According to Figure 

4, the region where the video shows with the zooming ratio b is smaller than 
that with the zooming ratio a. Suppose that the distance between the center of 
the projection plane arrowed by a viewing vector and the object is k in Figure 

5. Furthermore, in the case of (i), the width from the center to one edge of a 
viewing plane is u, and in the case of (ii), that width is v. The values u and v 
can be calculated by the zooming ratio of the camera. The explanation of this 
relation appears in Section 4. u and v can be expressed by rz — f{a)^v = f{b) 
using the function / which introduces the width of the video plane from the 
zooming ratio. 

We think of the relation between the zoom values a, b and the distances 5, t 
in the video, and the ratios of these values in Figure 4 and 5. 
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The equation above represents that the locations where annotations should be 
drawn can be calculated with the zoom values, if they are known. 



Object 



(i) (ii) 




Figure 5 Relation between the field-of-view angle and the zooming ratio. 



3.1.2 Changing of the Pan and Tilt Angles. When the video camera 
pans or tilts, annotations should run after the objects. To realize this function, 
a user needs to indicate where the object exists in the video with drawing a 
rectangle using the GUI (Graphical User Interface). The user encloses the 
object with a rectangle twice. At the second time, the object will be moved to 
a different position from where the user indicated at the first time. The angles 
and the distance which the object has moved in the video are found in the 
operation, and then are used to calculate the virtual viewing distance. Using 
the viewing distance, we can then calculate the distance which the object has 
moved in the video corresponding to the angle of pan and tilt. 

3.2. LEVELS OF DETAIL (LOD) 

LoD in this paper depends on the zooming ratio of the video camera and the 
size of some real world object. The levels are decided by the area which the 
object occupies in the video; the way to express annotations depends on the 
levels. 

The area of the object can be found by drawing a rectangle over the object 
using GUI. If the camera’s zooming ratio is changed, the area of the object is 
also changed. This problem is solved by the theory in 3.1.1. For example, in 




Networked Augmented Spatial Hypermedia System on Internet 245 



the case of three levels of detail, we can decide the levels by the area using the 
algorithm below. 

if (area > Area_MAX) 
level = Level.MAX; 

else if (area > Area.MID && area <= Area.MAX) 
level = Level.MID; 
else if (area <= Area.MID) 
level = Level_MIN; 

3.3. FUNCTION FOR PROTECTING PRIVACY IN 
VIDEOS 

This function’s purpose is to protect privacy of parts displayed in videos by 
concealing some partial information in the videos. The camera parameters are 
used to determine the area where the camera should not zoom in; the func- 
tion either makes the zooming ratio a proper value, or it prevents users from 
viewing the private part of the video. 

When we think of the width {w) of the video which we see as a real world 
and the precision (p) of the video, the more the camera zooms in, the smaller 
w becomes, w can be calculated by the zooming ratio (zoom) of the camera, 
so w is expressed sls w = f[zoom) using the function /. The precision p 
is defined as a pair of a certain distance and a certain area in the real world 
which corresponds to the minimum unit of the live video. Generally speaking, 
the minimum unit of the video usually corresponds to one pixel. The relation 
between w and p is derived as the equation (1) using the fixed number Ci. The 
equation represents the natural fact that we can obtain detail about objects in 
the video with high zooming ratio of the camera. 

- = Cl (1) 

P 

When we think of privacy, the precision p should be more than the fixed num- 
ber C 2 . 

P>C2 ( 2 ) 

C 2 depends on the visible area, so C 2 can be expressed as C 2 — g{pari^ tilt) 
using the function g and the camera’s angles: pan and tilt. We can also intro- 
duce the equations below by the equation (1), (2). 

w = f{zoom) > Cl ‘ C 2 = Cl • g{pan^tilt) (3) 

zoom < f~^{Ci’g{pan^tilt)) (4) 

The angles pan and tilt judge whether the zooming ratio is appropriate using 
the equations (3) and (4). 
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4 . EXPERIMENT FOR MEASURING THE 
CHARACTERISTICS OF A VIDEO CAMERA 

We can obtain the camera’s parameters, the angle of pan and tilt, and the 
zooming ratio. We used a CANON VC-Cl camera. This camera has a movable 
pan-tilt head which can be controlled from the computer through RS-232C 
serial interface. The pan angle ranges from 0 to 100 degrees, and the tilt angle 
ranges from 0 to 40 degrees. We also obtain the zoom values from 0 to 128. 
The smaller the value becomes, the wider the area in the video. 

The field-of-view angle of the camera depends on the zooming ratio (Figure 
5). For example, when the camera zooms in, we cannot recognize some objects 
out of the view and the objects at the center of the view looks larger. On the 
other hand, in the case of zooming out, the visible area spreads and the objects 
in the video look smaller. We carried out an experiment to find the relation 
between zooming ratios and field-of-view angles. Although some distortion 
usually occurs with lenses of cameras, we are not concerned about it in our 
research. 

First, we set the experimental grid in front of the camera and measured the 
distance between the camera and the grid. Next, we measured the length of 
the grid in the computer display and we obtained the field-of-view angle (the 
tangent of the field-of-view angle, exactly). The graph (Figure 6) shows the 
tangent of a half angle of the vertical field-of-view. The measured values are 
plotted in the graph. We get the following function which calculates the tangent 
of the field-of-view angle from the zooming value x. 

f{x) = 9.190750 X 10“®x^ - 0.003129416x + 0.2899015 

Thanks to the above function, we can determine the field-of-view angles from 
the camera’s zooming value. 

5. A PROTOTYPE SYSTEM NAME-AT 

We call a prototype system of a spatial hypermedia application Name-at. It 
is derived from the operation that we register the names at the objects in the 
video like putting some memos on the desk. 

5.1. EQUIPMENT FOR IMPLEMENTATION 

To realize the prototype system of the spatial hypermedia system, we use 
personal computers (PCs) and video cameras which can be controlled by PCs. 
A table under the camera connected to the computer through RS-232C controls 
the camera’s direction and the zooming ratio and so on. 

We can realize spatial hypermedia intended for the video of a distant view 
taken with the remote controllable camera, with the virtual video capture de- 
vice which was developed by us. This device enables us to use remote equip- 
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Figure 6 
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ment as if it were connected locally. Thanks to this device, we can easily get 
image data with read() system call, and give the camera control parameters 
(pan, tilt, zoom) through the system called ioctl() to control a remote video 
camera. 

5.2. AN EXAMPLE OF USING NAME-AT 

Figure 7 shows a view of Name-at GUI. There are two scroll bars for each 
of camera parameters, that is, tilt, pan and zoom. The thick scroll bars are used 
for setting the camera parameters by users and the thin ones show the present 
parameters of the remote camera. A user can control a remote video camera 
with them. If the spatial database includes the data of objects in the video, the 
names are drawn on the objects as text characters. The function LoD is applied 
as shown in Section 3.2. Users can also change the color of text characters. 

Now, we explain how to create data of annotations of the objects in the 
video. Using GUI, a user encloses the target object with a rectangle and enters 
some data, that is. Name, Info, and URL. The user then pans the camera and 
encloses the object again. This operation gives the virtual distance from the 
camera to the object and enables the system to locate the text characters on a 
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proper position. In addition, a user ean elick an annotation with a mouse and 
browse the indicated web pages when the data includes its URL. 

Figures 8 and 9 show examples of live videos with annotations. When we 
zoom up in the video of Figure 8, we can get the video shown in Figure 9 with 
more information given as URL strings below the name. While Figure 9 shows 
the URL of ‘SeaHawk-HOTEL’ as well as its name, Figure 8 shows only the 
name of the object. This is realized by the LoD function. As to the prototype, 
whether names and URLs are shown or not depends on the size of the area 
occupied by the object in the video. 




Figure 7 Overview of the GUI of Name-at. 




Figure 8 A live video with names of objects. 
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Figure 9 A live video with names and URLs of objects. 



6 . SPATIAL EXTENSION TO WEB USING XML 

WWW(World Wide Web), which mainly consists of HTTP, URL and HTML, 
is the most popular hypermedia system at present. Spatial hypermedia can also 
become more useful with being integrated as a WWW extension. 

In this section, we propose a framework of an extended spatial hypermedia 
system which is linked to WWW. This enables us to use spatial hypermedia 
whenever we browse web pages and look for live videos on Internet. In order 
to realize this framework, the protocol which controls videos and the oper- 
ation of video cameras is required. Moreover, the URL should be extended 
to indicate videos showing the real world which are captured by video cam- 
eras connected to Internet. We construct a language which is available for an 
integrated spatial hypermedia on Internet with XML (extensible Markup Lan- 
guage). The following is the DTD (Document Type Definition) describing a 
spatial hypermedia database based on our prototype. (DTD of extended spatial 
hypermedia is in Appendix.) 

Spatial hypermedia 

<! ELEMENT natlist (nat*)> 

<!ATTLIST natlist camera CDATA> 

‘camera’ indicates a video camera on a network, which captures a live video 
for spatial hypermedia. The number of annotations is not less than zero, ‘nat’ 
means the abbreviation of Name-at. 
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Objects in the real world 

<! ELEMENT nat (lod, area, 3d)> 

<!ATTLIST nat name CDATA 

url CDATA 

info CDATA> 

The object data consists of following three parameters: levels of represen- 
tation detail Tod,’ the information about the rectangles enclosing the specified 
object ‘area,’ and the virtual distance between the camera and the object ‘3d.’ 
The variables ‘name,’ ‘url,’ and ‘info’ represent the name, the URL, and the 
additional information about the object, respectively. 

The following are the details of tags of the object ‘nat.’ 

lod 

<!ATTLIST lod type (size I zoom | privacy) > 

This tag represents the kind of LoD. The level is decided by the area where 
the object occupies (‘size’), or by the zooming ratio of the camera (‘zoom’). If 
‘privacy’ is indicated, the object cannot be displayed with more than its setting 
spatial resolution. 



area 



<! ELEMENT area 


(camera, 


rectangle) > 


<!ATTLIST area 


number 


(1|2)> 


<!ATTLIST camera 


pan 


CDATA 




tilt 


CDATA 




zoom 


CDATA> 


<!ATTLIST rectangle 


xO 


CDATA 




yo 


CDATA 




xl 


CDATA 




yi 


CDATA> 



This tag has the information about the rectangles that a user encloses in 
order to indicate the object, ‘camera’ represents the condition of the camera, 
‘rectangle’ shows the coordinates of the rectangle. 

3d 

<!ATTLIST 3d distance CDATA> 

The attribution ‘distance’ shows the virtual distance between the camera and 
the object. 
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We show an example of XML followed DTD as mentioned above. 

<nat name='’SeaHawk“HOTEL" 

url=”http : // WWW. hawks town. com/seahawks/" 
inf o="Momochi'’> 

<lod type=’'size'*> 

<area number="l"> 

<camera pan="33" tilt=*'21" zoom="18"> 

<rectangle x0="113” y0='*65'’ 

xl='’211'‘ yl=‘'129”> 

</area> 

<area mimber=”2’*> 

<camera pan='’42'* tilt=*'19" zoom="18"> 

<rectangle x0=’'29" y0="64" 

xl=”125" yl="127”> 

</area> 

<3d distance="543.358521'*> 

</nat> 

This represents the object named ‘SeaHawk-HOTEL/ which has a virtual 
distance of 543.358521 from the camera. Whether the name and the URL are 
displayed depends on the extent of the object in the current video. 

Thanks to the proposed language for describing spatial information, we can 
use networked video cameras as a device which is a part of spatial hypermedia. 
In addition, multiple users can create spatial data and integrate them into a 
larger database .which can be shared by many users. 

7. RELATED WORK 

A direct guidance system developed by NTT which shows annotations on 
live videos is similar to our system in terms of the concept and the technical 
frameworks. The advantages of our prototype over the direct guidance system 
are as follows: 

■ Networked video cameras are available using a virtual video capture de- 
vice. 

■ Implementation for privacy 

■ Realizing the LoD function 

■ Users can easily make and delete spatial information with GUI. 

■ We consider using our system on Internet: integration with a Web browser 
and extension with XML 
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8. CONCLUDING REMARKS 

We proposed a framework of spatial hypermedia to augment and reduce in- 
formation in the real world, and described our implementation of the prototype 
system for remote live videos. In Section 6, we suggested an extension of 
spatial hypermedia systems using XML in order to introduce spatial browsing 
WWW by spatial URL, which can be defined by the extent of a real world 
object in a video in addition to the position, direction and zooming ratio of the 
camera. 

Many live videos will become available on Internet in the near future. Spa- 
tial hypermedia must become an important application to make good use of the 
information contained in the videos with georeferenced information in Web 
space. 
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Appendix: DTD for Extended Spatial Hypermedia 



< ! DOCTYPE 


natlist 


[ 




< ! ELEMENT 


natlist 


(nat*)> 




<!ATTLIST 


natlist 


camera 


CDATA> 


< ! ELEMENT 


nat 


(lod, area, 3d)> 


<!ATTLIST 


nat 


name 


CDATA 






url 


CDATA 






info 


CDATA> 


<!ATTLIST 


lod 


type 








(size 1 zoom I privacy) > 


< ! ELEMENT 


area 


(camera. 


rectangle) > 


<!ATTLIST 


area 


number 


(1|2)> 


<!ATTLIST 


camera 


pan 


CDATA 






tilt 


CDATA 






zoom 


CDATA> 


<!ATTLIST 


rectangle 


xO 


CDATA 






yo 


CDATA 
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xl CD AT A 

yl CDATA> 

<!ATTLIST 3d distance CDATA> 

]> 
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Abstract Recently, it has been a matter of great importance to publish the mul- 
timedia data objects stored in various information sources. The World 
Wide Web is often used as publication media. In such context, a cru- 
cial point is how to present the results of the set-at-a-time operation 
(querying and restructuring of data in underlying information sources). 
Frameworks to specify how to query and restructure data are usually 
different from those to specify the presentation of the results in existing 
systems. This paper proposes a visual user interface which amalga- 
mates authoring, querying, and restructuring functions for multimedia 
Web view construction. The user is only required to drag and drop data 
objects, just like in typical authoring tools for HTML and SMIL pages. 
A feature of our user interface is that the user can designate an exist- 
ing data object as an example, which will serve as the representative 
of a set of data objects. Manipulation of an example is interpreted as 
manipulation of the set of data objects. Therefore, the object-at-a-time 
authoring framework and the set-at-a-time data manipulation (querying 
and restructuring) framework are integrated in a seamless way. Another 
feature is that the interface can cope with semistructured data, which 
often appear in the context of multimedia view construction. This pa- 
per also provides the formal semantics of data operations through the 
visual user interface. 

Keywords: Visual Interface, Visual Query, Multimedia, WWW Page Authoring 

1. INTRODUCTION 

Recently, it has been a matter of great importance to publish the data objects 
stored in various information sources. The World Wide Web is often used as publica- 
tion media. The data objects often include not only numerical and text objects, but 
also multimedia objects such as image, audio, and video objects. Many systems have 
been proposed in the literature, and a large number of systems are currently used in 
practice. For example. Web-site management systems such as Strudel (Fernandez et 
al., 1998) can create different Web views on top of heterogeneous information sources. 
The information sources can be traditional databases and existing Web pages con- 
taining multimedia objects. Practical examples are many tools for Web application 
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development. One of their main functions is to show the query result of back-end 
databases in the form of Web pages. 

In such systems, a crucial point is the presentation of the results of the set-at-a-time 
operation (querying and restructuring of data in underlying information sources). The 
user is usually required to adopt different schemes to query and restructure data and to 
present the result. For example, Strudel requires the user to use StruQL for querying 
and restructuring data, while the HTML template language is used for specifying 
how to present the result. Also, typical Web application development environments 
require the user to use SQL for querying and offer visual tools for designing the layout 
of the result. 

This paper proposes a visual user interface which amalgamates authoring, query- 
ing, and restructuring functions for multimedia presentation. In our context, con- 
struction of multimedia presentation means creation of Web views, involving HTML 
and SMIL(W3C, 1998)-based multimedia Web documents, on top of heterogeneous 
information sources. The interface looks like just a common authoring tool for HTML 
and SMIL documents: The user is only required to drag and drop data objects pre- 
sented in windows jnto a blank window named the canvas. He can put data objects 
anywhere he likes, and specify their sizes with mouse operations. As a feature, our 
interface allows the user to designate an existing data object as an example. Then, 
the data object (the example) serves as the representative of a set of data objects. A 
drag-and-drop operation of the example is interpreted as manipulation of the set of 
data objects. Therefore, the object-at-a-time authoring framework and the set-at-a- 
time data manipulation (querying and restructuring) framework are integrated in a 
seamless way. 

Another important feature of our user interface is that it can cope with semistruc- 
tured data (Abiteboul, 1997)(Buneman, 1997). This feature is important in multime- 
dia integration and presentation. Multimedia objects are often stored and managed 
in the World Wide Web, which is a well-known example of semistructured data. Be- 
cause the data structure is often irregular and implicit in semistructured data, the 
domain of the objects an example represents cannot be fixed in advance. In contrast 
to this, the domains are fixed in advance in QBE (Zloof, 1977) and other QBE-like 
query languages for relational databases. In our framework, the domain is defined 
dynamically according to the user’s interaction with the user interface. By specifying 
’another’ examples, the user allows the interface to infer the intended domain. This 
feature is essential for operation of semistructured data. 

The visual user interface is originally designed for an information integration sys- 
tem Info Weaver, which we have been developing for heterogeneous information inte- 
gration (Kitagawa et ah, 2000)(Morishima et ah, 1999(b)) (Morishima et ah, 2000(a)). 
Although the fundamental design of the interface is independent of Info Weaver, In- 
fo Weaver provides one of typical contexts where our user interfaces is useful. Figure 
1 shows the architecture of Info Weaver. The mediator (Wiederhold, 1992) and wrap- 
pers (Roth et al, 1997) are used for integration. The wrappers provide the mediator 
with views on top of information sources (based on WebNR/SD, the common data 
model in our environment). The user manipulates data through the mediator. 




Figure 1 Integration environment Info Weaver. 
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The main contributions of this paper are as follows. 

1 We propose a visual user interface for constructing multimedia presentation 
on top of information sources. It amalgamates the object-at-a-time authoring 
operation and the set-at-a-time querying and restructuring operations. 

2 The information sources can be heterogeneous information sources involving 
semistructured data. Through the interaction with the user, the system infers 
the target data objects. 

3 We present the formal semantics of the data operations through the visual user 
interface. 

The rest of this paper is organized as follows. Section 2 shows an application 
scenario. Section 3 explains the basic concepts for the visual user interface design. 
Section 4 shows how to construct the multimedia Web view through the visual user 
interface. Section 5 explains the formal semantics. Section 6 briefly surveys related 
work. Section 7 is the conclusion. 



2. APPLICATION EXAMPLE 

This section shows an example of multimedia Web view construction. The infor- 
mation sources include RDBs and XML-based Web pages. The output is a collection 
of HTML and SMIL Web pages. First, we show the example scenario. Then, we 
explain how to use SMIL for multimedia presentation. 

2.1. EXAMPLE SCENARIO 

We consider two relational databases and the Web as information sources. (I) A 
baseball game video database: This is a relational database which contains video (Real- 
Media (RealNetworks, Inc.)) objects and their metadata. The video objects and their 
metadata are stored in the relation VIDEO(VID, GID, Begin, End, Batter, Pitcher, 
Contents). VID is a Video ID. The domain of the attribute Contents is an ADT for 
RealMedia, named VIDEO type. The other attributes GID, Begin, End, Batter, and 
Pitcher represent metadata on the Contents. The meaning of the relation VIDEO is 
that each VIDEO value in Contents records a scene starting from the Begin time to 
the End time of a game (represented by GID), where Batter and Pitcher are facing 
each other. (2) A baseball statistics database: This is a relational database which 
maintains the latest statistics about baseball players. This database contains the 
relation BATTING-STATS f P-Name , Hit, RBI, AVG) (3) A baseball players’ profile 
Web site: This site contains the profile information of baseball players. The Web-site 
structure is shown in Figure 2(a). The index page contains links to baseball team 
pages. Each team page contains the team logo (as a reference to a GIF format file) 
and links to player pages. Each player page contains the profile data. We also assume 
that the pages on the site are written in XML and that there are two variations in 
the structure of the team pages. Figure 3 shows the two variations in page structure. 
One has flat structure, while the other groups players into categories. 

The requirement here is to create a multimedia Web view on top of the above 
information sources. A SMIL Web page is constructed for each player whose batting 
average is more than 0.3 for the current season. It is a multimedia page (Figure 2(b)), 
which consists of three different kinds of components: (I) A sequential rendering of 
scenes (video objects) in which he is at bat. (2) The logo of the team he belongs to. 
(3) Text description of his profile. 

An index HTML page is also created (Figure 2(c)). It contains an image ob- 
ject (’GoodBatters’), the selected players’ names, batting averages, and links to the 
players’ multimedia pages. 
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Figure 2 Multimedia Web view on top of heterogeneous information sources. 



<team> 

<tname>Tigers</tname> 

<logo><img src="TigersLogo .gif "/></logo> 
<players> 

<player ppage="http: // . . ">Johnson</player> 
<player ppage="http: // . . ">Thomas</player> 

</players> 

<team> 



(a) Team page of Tigers 



<t.p»am> 

< t n am e > G i an t s < / 1 n atm e > 

<logoXimg src="GiantsLogo . gif "/></logo> 
<players> 

<f ielders> 

<player ppage="http: // . . ">Larry</player> 

</f ielders> 

<pitchers> 

<player ppage="http: // . . ">Brian</player> 

</pitchers> 

</players> 

</team> 

(b)Team page of Giants 



Figure 3 Team page variations. 



2.2. SMIL 

SMIL (Synchronized Multimedia Integration Language) can represent integrated 
multimedia contents as XML-based tagged text. Figure 4 is a part of a SMIL page, 
that represents a multimedia page explained in Subsection 2.1. 



<smil> 

<body> 

<par> 

<seq> 

<video src="Scene_Jl . rm" region="Rl"/> 

<video src="Scene_J2 . rm" region="Rl"/> 

</seq> 

<img src="TigersLogo . gif " region="R2"/> 

<textstream src="Prof ileJ.rt" region="R3"/> 

</pax> 

</body> 

</smil> 

Figure 4 Part of a SMIL page. 

This example contains two videos, one image, and one textstream. The <video/>, 
<img/>, and <textstream/> tags^ represent references to the video, image, and 
textstream objects, respectively. These tags have the following attributes. The src 
attribute represents the URL of the referenced data object. The region attribute rep- 
resents the region where the data object is rendered. (The layout (region) definition 
is omitted in Figure 4.) 
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Tags <seq> and <par> give synchronization information. The multimedia objects 
directly surrounded by the <seq> tag are presented in a sequential way. On the other 
hand, the <par> tag specifies that objects are presented in parallel. Therefore, the 
example SMIL page specifies that the multimedia presentation shows in parallel a 
sequence of two scenes (video objects), an image object, and a textstream object. 

3. BASIC CONCEPTS 

This section explains the basic concepts involved in the visual user interface design. 
Formal definitions are given in Section 5. 

3.1. WINDOWS 

The interface consists of three types of windows. 



)DataBox3:BS(BATTING-STATS 
T"'&n.n 1^ 


(b) DataBox4:VDfVIDEO) 
1 Open 1 


(c)DataBoxl :TPfTeamPage) 

1 C)DQn^ 


Namej rdb:Batting-st 


Namej VideoDBiVIDEO 1 


Name:| | 


(^N V io 


C3Jext^^)(^eviou^ 


(^N exO^evi oi^ 
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End: 
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1 
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— 
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Name: I 
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r-orofile- 



He graduated fro m X. UniJ . 



Image 


Link Listitem HorizontalRule 


| - GoodBatlers ■ 





Figure 6 Palette. 



Figure 5 DataBoxes. 

DataBox: A DataBox is used to display a set of data items stored in an information 
source. Figure 5 shows example DataBoxes. The DataBoxes (a) and (b) are used 
to display relations in relational databases. In this case, each DataBox is connected 
to a relation, and the display unit is a tuple. The user can click the Next and 
Previous buttons to browse other tuples in the relation. The relation to be displayed 
is designated by using the Open menu of the DataBox. The DataBoxes (c) and (d) 
are used to display Web pages. In this case, each DataBox is associated with one or 
more Web pages. The display unit is a Web page. The Web page(s) to be displayed 
is designated either by specifying a URL or by using a mechanism to gather Web 
pages. This paper assumes that we can gather Web pages using some mechanism. 
We reported a page gathering mechanism via browsing and querying in (Morishima 
et al., 1999(a)). Also, Web query languages such as WebSQL (Mendelzon et ah, 1996) 
can be used for this purpose. But we omit the details because it is beyond the scope 
of this paper. A DataBox displays a unit of data items (a tuple or a Web page) at a 
time. In the remaining part of this paper, we call a display unit in a DataBox just a 
page. 

Palette: A Palette is a window which provides various component data objects to be 
included in the data manipulation result. Figure 6 is an example of a Palette. The 
palette contains an image object, a hypertext link object, a dot object to represent a 
list item, a horizontal rule object, and so on. 

Canvas: The Canvas is a blank window into which the user can drag-and-drop data 
objects from DataBoxes and Palettes. The user can put data objects anywhere he 
likes, and specify their sizes with mouse operations. 

3.2. OBJECTS 

In our context, an object is the unit of drag-and-drop operation. For example, an 
element (a substring which is surrounded by <g> and </g> tags) contained in a Web 
page and an attribute value in a relation are objects. 
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3.3- DRAG-AND-DROP 

Drag-and-drop is the basic operation of this interface. By dragging-and-dropping 
objects from DataBoxes and Palettes into the Canvas, the user can construct various 
multimedia Web views on top of heterogeneous information sources. (See Figure 7.) In 
Figures 7~12, multiple pages are shown simultaneously in a DataBox for explanatory 
purposes. Actually, the user has to press Next and Previous buttons to see them. 





Figure 1 Drag-and-Drop operation and the result. 



3.4. EXAMPLES 

Introduction of the concept of examples into a drag-and-drop-based page authoring 
framework is the distinguishing feature of our user interface design. The user can 
designate an object as an example by clicking a mouse button on the object. (We 
explain this in Section 4.) We call the object specified as an example an example object, 
or shortly, an example. A drag-and-drop operation of an example is interpreted as 
manipulation of a set of objects the example represents. Therefore, the object-at- 
a-time authoring framework and the set-at-a-time data manipulation (querying and 
restructuring) framework are integrated in a seamless way. 



3.5. TARGET SETS 

A set of objects an example represents is called the target set of the example. As 
a default, the target set is defined as the set of objects each of which appears at the 
same position on a page as the example object. Manipulation of an example means 
manipulation of objects in its target set (Figure 8). 




Result 



John 0.304 



Thomas 0.304 



Figure 8 Manipulation of an example and the result. 



3.6. ASSOCIATIONS 

When the user specifies multiple examples (and their target sets), it is often the 
case that associations occur among the target sets. Two types of associations are 
considered in this interface design. The first one is the structural association (S- 
Association) . This occurs according to a structural relationship (relative position) 
between two examples. For example, if two examples are on the same page, it implies 
an S- Association that objects taken from their target sets must reside on the same 
page. (Actually, S- Association can have more general meaning. We explain this in 
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Section 5.) The second one is the value association (V-Association). This occurs if 
values of example objects are the same. If any association occurs among the target 
sets, only some combinations of objects are qualified to be manipulated. Note that 
an association serves as a kind of join condition. 

For example, suppose that the user wants to change the page layout in the DataBox 
in Figure 8 so that the name and the batting average appear side by side. If the 
user takes example objects as in Figure 9, the system considers that there is no S- 
Association between the two target sets. Therefore, all combinations of the objects 
appear in the result. In contrast, if he takes examples as in Figure 10, S- Association 
occurs and he gets the intended result. 

Then, suppose that we have two DataBoxes as shown in Figure 11, and that the 
user wants the pages each of which contains a player’s name, his team name, and his 
batting average. If he takes example objects as in Figure 11, there is no V- Association. 
However, if he takes examples as in Figure 12, V-Association occurs and he gets the 
intended result. 

We explain the formal semantics in Section 5. 



DataBox Canvas Result 




Figure P No S- Association between target sets A and B. 




Figure 10 S- Association between two target sets A and B. 
(Only the pair of the objects on the same page are considered.) 
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Figure if No V-Association. 
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DataBox 1 Canvas 



Figure 12 V- Association between target sets A and C. 

(Only objects with the same value match each other.) 

4. OPERATION 

This section explains how to manipulate data through this user interface, and gives 
specifications for the example scenario shown in Section 2. 

4.1. EXAMPLES AND TARGET SETS 

If the user clicks the right mouse button on an object in a DataBox, a menu 
appears (Figure 13). Selecting the “Example” menu item specifies that the object is 
an example. At first, the system assigns a default target set to the example object. 
It is the set of objects each of which resides at the same position on a page as the 
example. For example, in Figure 13, the target set would be the set of the P-Name 
values in Relation BATTING-STATS. The “Another” and “Clue” menu items are 
used to modify the target set. 

Example objects are always highlighted in DataBoxes. If the user clicks the left 
mouse button on an example object, objects in its target set are also highlighted. 
Therefore, the user can identify non-example objects, example objects, and their 
target sets anytime. 

The following regular expression shows the operation procedure through our user 
interface. 




Result 



John 




Larry 


Tigers 




Giants 


0.304 




0.315 



( ’Example’ (’Another’ | ’Clue’)* | ’D&D’ )* 



Here, ’Example,’ ’Another,’ and ’Clue’ mean selections of respective menu items on 
an object. The ’D&D’ means Drag-and-Drop operation. We call them ’Example,’ 
’Another,’ ’Clue,’ and ’D&D’ operations, respectively. Intuitively, the user interface 
allows any combinations of the following operation patterns. 



■ To specify that an object is an example, and accept the target set. 

■ To specify that an object is an example, and change the default target set by 
successive ’Another’ and ’Clue’ operations. 

■ To drag and drop an non-example object (an object which the user has not 
specified as an example or one in a Palette), and arrange it on the Canvas. 

■ To drag and drop an example object, and arrange it on the Canvas. 
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Figure 13 Menu to specify examples and Figure 14 ClueBox. 

target sets. 



If ’Another’ operation is performed after ’Example’ operation, the target set of 
the example is extended to include the ’Another’ object. Intuitively, the system tries 
to generalize the relationship between the position of the example and that of the 
’Another’ object, and includes objects which have the generalized relationship with 
the example into the target set. (We show the rules in Subsection 5.2.) 

If ’Clue’ operation is performed after ’Example’ operation, the system narrows 
the target set. First, the ClueBox appears on the display (Figure 14). The user 
can change the shown condition into another one. For example, he can change the 
condition 0.304” into “> 0.3.” In general, ’Clue’ operation makes the target set 
contain only the objects Oi each of which satisfies all the following conditions. 

■ Oi is included in the original target set. 

■ Oi has a corresponding object Ci such that they have the same relative position 
as the example object and the ’Clue’ object have. (In the above example, Ci is 
the batting average of each player.) 

■ Ci satisfies the condition specified in the ClueBox. (In the example, the batting 
average value has to be more than 0.3) 

Therefore, in the example, the target set is narrowed to contain only the players 
whose batting averages are more than 0.3. 



4.2. ASSOCIATIONS 

As mentioned in Subsection 3.6, S- Association occurs according to the structural 
relationship (relative position) between two example objects. V- Association occurs if 
two example objects have the same value. 



4.3. GROUPING 

By default, one result page is generated for each (qualified) combination of objects 
(See Figures 9^12). This rule can be changed by putting the repetition mark (*) at 
the appropriate position on the Canvas. Essentially, it works as Nest operator of the 
nested relational algebra (Fischer, 1983). The following subsection includes examples 
of grouping. 



4.4. OPERATIONS FOR THE EXAMPLE 
SCENARIO 

Figure 15 illustrates specification to obtain the required result of the example 
scenario given in Section 2. We assume here that DataBoxes TP, PP, BS, and VD 
contain all the team pages, all the player pages. Relation BATTING- STATS, and 
Relation VIDEO, respectively. We show the operation sequence. 
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(1) Open the Canvas and declare the construction of an HTML page. (Then, the 
system opens a space for the HTML page on the Canvas.) 

(2) D&D the image ’Good Batters’ from the Palette into the Canvas. 

(3) D&D the ’Listitem’ object from the Palette into the Canvas. 

(4) Specify that ’Johnson’ in TP is an example. The default target set includes 
those players who appear first in the player list of each team page having 
the structure shown in Figure 3(a). Next, specify that ’Thomas’ in TP is 
an ’Another’ object. Then, press the Next button of the DataBox TP to 
find the team page of Giants, and specify that ’Larry’ on the Giants’ page 
is an ’Another’ object. (Alternatively, you can use another page which has 
the structure shown Figure 3(b).) The system uses rules to generalize the 
relationship between positions of ’Johnson,’ ’Thomas,’ and ’Larry,’ so that the 
target set of this example is extended to include all players of all teams. 

(5) D&D the “Johnson” from TP into the Canvas. 



[a|DataBox1:TP(TeamPage) 






Open 



Name: 



Tigers 

A Jo h "^1 \ 

I • Thomas | \ 

I ••• I 

I / 



(b) DataBox2:PP(PlaverPaqe) (c)DataBox3:BSfBATTING-STATS) 



Name: I 

[C3jexP)(^vi^^ 



: Johnson jy. 






He graduated fro m X. Uni^ . 



\ 



Name faTBiBATTiNG-sT 



(^exP)(^eviou^ 



P-Name: | 
Hit: j 

RBI: 

AVG: I 



79 



■.LQi304;8.9.| 





Image 


Link LSs^ltem ,'fHolizo'K^IRule 




Good BattefS 


— 

^ . 




1 1 Open 1 1 


Namej 


VideoDB:VIDEO || 


(^exO^evioi^ j 


VID: 


1 


GID: 


1 


Begin: 


0:00:0s 


End: 


0:40:0s 


Batter: 


'Johnson.! 7. 


Pitcher: 


Mike 


Contents 


I b I 




i 112. 



(e) Palette 



w. V / 

^ \ \ SMIL / ^ 


\ 

\ 

i 

i 




cvi 


r 

©: 


He graduated from X Univ, 



Figure 15 Specification for the example scenario. 



(6) Put a repetition mark (*) on the list item object. As a result, all the players 
are listed in this page. Otherwise, a new page is produced for each player. 

(7) Specify that ’Johnson’ objects in PP and VD are examples. Note that the 
three target sets of ’Johnson’ objects in TP, PP, and VD have V-Association. 
Therefore, this specifies equality joins between their target sets. 

(8) Specify that ’Johnson’ in BS is an example. (Its target set also has V-Association 
with the above three target sets.) Then, specify that ’0.304’ in BS is a clue 
of the example. Rewrite the condition in the ClueBox and make it ’> 0.3’ so 
that the target set of the example ’Johnson’ in BS includes only players with 
their batting averages over 0.3. 
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(9) Specify that ’0.304’ in BS is an example. D&D it from BS into the Canvas. 

(10) Declare the construction of a SMIL page. The system opens a space for the 
SMIL page on the Canvas. 

(11) D&D the HypertextLink object from the Palette into the Canvas. Connect it 
to the SMIL page. 

(12) Specify that ’1-5^’ in VD is an example. D&D it into the Canvas. 

(13) Put the repetition mark (*) on the dropped object. As a result, the scenes 

(video objects) of a player are rendered sequentially as one video. Otherwise, 
a SMIL page is produced for each scene. 

(14) Specify that ’©’ in TP is an example. D&D it into the Canvas. 

(15) Specify that the profile in PP is an example. D&D it into the Canvas. 

(16) Press the ’Create’ button on the Canvas. 

Figure 16 is a screen shot of the prototype system under development where the 
user is doing the example scenario operations. 




Figure 16 A screen shot from the prototype. 



5. SEMANTICS 

This section gives the formal semantics for the user interface in terms of the pred- 
icate logic and the nested relational algebra (Fischer, 1983). The formal semantics 
defines the behavior of the user interface more precisely. Therefore, it is not only of 
use for theoretical analyses, but it also makes the proposed scheme to be more easily 
applicable to various contexts, such as semistructured databases and CGI-based Web 
page generators. 

We define the formal semantics in the following three steps. 

1 We express the source data as an object tree. 

2 We derive the target relation according to the user’s interaction. The target 
relation represents target sets and the associations among objects in the target 
sets. 

3 We restructure the relation into a nested relation which reflects the grouping 
structure specified by repetition marks on the Canvas. This nested relation 
specifies the final result. 
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5.1. DATA MODELING 

We represent the source data as an object tree. The tree in Figure 17 represents 




Every node is annotated with a label, which consists of a label name and a label 
number. (We omit the root label for simplicity.) The followings are some remarks. 
(1) Subtrees whose roots are second level nodes (children of the root) correspond to 
DataBoxes. They have labels in the form of where DB is the name of the 

corresponding DataBox. Subtrees whose roots are third level nodes correspond to 
pages. They are labeled with ’PAGE.i.’ (2) A subtree whose root is a fourth level 
node represents an XML page or a tuple of a relation. (3) The label number of a node 
is 1 if none of its sibling nodes has the same label name. Otherwise, label numbers 
are sequentially assigned to sibling nodes with the same label name. (4) We refer 
to the nodes of the tree as objects and tag them with OIDs. In Figure 17, some 
OIDs are explicitly presented in the form of &n for the convenience of the following 
explanation. (5) Note that this is semistructured data. There are two kinds of team 
pages whose structure is different. A PLAYER may be a direct child of a PLAYERS 
element or be placed under other elements such as FIELDERS and PITCHERS. 

In the following discussion, path{o) and value{o) denote the path from the root to 
the object o and the value of o, respectively. For example, path(&z,12) =TP.l^PAGE.l 
^TEAM.l ^PLAYERS. 1->PLAYER.1, and value{M2) = “Johnson.” We often 
reference an object by its value if there is no ambiguity. 

5.2. TARGET SETS 

Each example object has a corresponding target set. Given an example e, its target 
set (denoted by TSe) is defined as follows. 

Case 1: If no ’Clue’ operation has been invoked for e, 

TSe = {o\o ^ O f\ Candidate-Prede{o)}, 

where O is the set of all the objects in the object tree, and Candidate- Prede{o) is 
a candidate predicate incorporating a path expression. A path expression is simi- 
lar to a path but may contain wildcards. Candidate- Prede{o) holds if and only if 
path{o) conforms to the path expression. The candidate predicate is determined by 
the ’Example’ and ’Another’ operations as shown below. 

Example The following TSgci 2 gives the target set specified by Operation (4) in 
Subsection 4.4. 

TSg,i 2 = {o\o e O ATP.1-4PAGE.?->TEAM.I -^players. 1->?*->PLAYER.?[o]} 
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Wildcards considered in the paper are listed in Figure 18. Given the target data 
shown in Figure 17, TSg^i 2 = {“Johnson”, “Thomas”, . . . , “Larry”, . . . , “Brian”, . . .} 
(all players of all teams). Note that the wildcard ’?’ matches with any label number, 
and that ’?*’ with any sequence of any nodes (’?*’ also matches with the sequence 
whose length is 0). Therefore, all players in the two different kinds of team pages are 
included. 



Wildcard 


What to match with 


Name.? 


A node with label name Name 


7 


A node with any label 


7* 


Any sequence of any nodes (the length can be 0) 



Figure 18 Wildcards. 



Derivation of Candidate Predicates Candidate- Prede{o) is determined 
by ’Example’ and ’Another’ operations. In the derivation process, paths and values 
of objects play important roles. For this purpose, we add annotations to predicates. 
Annotations for Candidate- Prede(o) give information on path(e) and value(e)^. An- 
notations are surrounded by “(” and We represent the null sequence as £. 

For example, specification of TSg^i 2 with annotations is eis follows. 

TS &.12 = {o\o e OaTP. 1^PAGE.?(PAGE.1) -^TEAM.l 

->PLAYERS.1-^?*{£) -4PLAYER.?(PLAYER.l)[o(Johnson)]} 

Note that the annotations give information on path{^12) and ua/ue(&12). 

In general. Candidate- Prede{o) is derived as follows. 

(1) First, when the user specify that the object e is an example, the default candidate 
predicate p[o{v)\ is derived. Here, p is same as path(e) except that its PAGE.z is 
replaced by PAGE.? (PAGE. z), and v is value(e). 

For example, consider Operation (4) in Subsection 4.4. When the user speci- 
fies the object &12 (with its value “Johnson”) as an example, the default candi- 
date predicate derived is TP.l->PAGE.?(PAGE.l) TEAM.l PLAYERS. 1-^ 
PLAYER. l[o( Johnson}]. The predicate defines the default target set explained in 
Subsection 4.1. That is, the set of objects which appear at the same position on 
different pages. 

(2) ’Another’ operations modify the candidate predicate to accept the ’Another’ ob- 
jects. As mentioned in Section 4, the system tries to infer the target set the user 
intends according to the example and ’Another’ objects. For this purpose, we use 
modification rules as shown in Figure 19. Each rule prescribes how to modify the 
original predicate according to path(e) and path(a), where e and a are the example 
and an ’Another’ object, respectively. Note that we can get path(e) from the original 
predicate because it has annotations. In Figure 19, B and C denote label names, 

Qi denotes a partial path, and pi denotes the partial path expression of the original 
predicate which qi conforms to. q[ is a partial path that pi can accept. The basic 
idea behind the rules is to place a wildcard at the position where path{e) and path(a) 
conflict with each other. 

For example, in Operation (4), the user specifies the object &24 (with its value 
“Thomas”) as the first ’Another’ object. Then, path{a) = TP.l— >PAGE.l-^ TEAM.l 
^PLAYERS. 1->PLAYER.2. We can obtain path(e) = TP.l->PAGE.l -^TEAM.l-> 
PL AYERS. l^PL AYER. 1 from the annotated default candidate predicate. The sys- 
tem finds that the default candidate predicate cannot accept path{a) because PLAYER. 1 
in the path expression conflicts with PLAYER. 2. In this case, they conflict at their la- 
bel numbers. Therefore, we can apply Rule 1. Here, pi = TP.l— > PAGE.? (PAGE. 1)-^ 
TEAM.l ^PLAYERS. 1, qi = q[ = TP.l-^PAGE.l-^TEAM.l -^PLAYERS. 1, and 
P 2 = q2 = q '2 = The modified predicate becomes TP.l— ^ PAGE.? (PAGE. 1)—>TEAM.1 
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PLAYERS.l— ^PLAYER.?(PLAYER.l)[o(Johnson)]. Next, the user specifies the 
object &33 (with its value “Larry”) as the second ’Another’ object. In the similar 
way, we can apply Rule 3 to the modified predicate. This results in the above TSgci 2 - 
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Figure 19 Rules for modification of candidate predicates, {pi, qi and q[ can be s.) 

Case 2: If a ’Clue’ object cl is specified for e with ’Clue’ operation, 

TSe — {o\o G O f\ Candidate- Fred e{o) /\3c G 0{Clue-Predci{c) A S A-Prede,ci{o, c))} ^ 

where O is the set of all the objects in the object tree. Candidate- Prede{o) is the 
candidate predicate which is derived according to the ’Example’ and ’Another^ op- 
erations as explained above. The clue predicate Clue-Predci{c) and S- Association 
Predicate SA-Prede,d{o,c) are derived from the ’Clue’ operation. Clue-Predd{c) is 
also a predicate incorporating a path expression with wildcards. The difference is that 
it prescribes a condition on value{c). The predicate SA-Prede,d{o^c) constrains the 
relative position of o and c. Intuitively, an object o in the target set has to satisfy the 
candidate predicate, and be accompanied by an object c which satisfies the following 
conditions as well. 

■ o and c have the same structural association (relative position) as the example 
object e and the clue object cl. 

■ c satisfies the condition specified in the ClueBox. 

Example Operation (8) for the example scenario derives the following target set. 
(It contains annotations, too). 

= {o|o G O Ap— ^P-NAME.l[o(Johnson)] A3 c G 0(p— >-AVG.1 [c > 0.3] A5/iarcp(o, c))} 
where p = BS.l^PAGE.?(PAGE.l)^TUPLE.l. 

In the above example, “BS.l^PAGE.?(PAGE.l)-^TUPLE.l^AVG.l[c > 0.3]” is 
the clue predicate. The predicate holds if and only if path{c) conforms to the path 
expression and value(c) is more than 0.3. SA-Prede,d{o, c) has the form SharCp{o, c), 
where p is a path expression. SA-Prede,d{o,c) in the above example is ShareBS.i-* 
PAGE.?-^TUPLE.i{o^c), It holds if and only if path{o) and path{c) share the same 
partial path which starts from the root and conforms to the path expression “BS.l— 
PAGE.?— ^TUPLE.l” (See Figure 20.) Therefore, TSg^rg contains only the players 

whose batting average is more than 0.3. Like candidate predicates, clue predicates 
can be tagged with annotations to record path(cl). Thus, we can obtain path(e) 
and path(cl) from annotated candidate and clue predicates. In the above example, 
path{e) = path{Sz79 (with the value ’Johnson’))=BS.l— ^-PAGE.l— > TUPLE. 1-^P- 
NAME.l, and path(cl) = path{Sz9l (with the value ’0.304’))= BS.l— ^ PAGE.l— > 
TUPLE.l->AVG.l. 

More formal description of the predicate derivation in the case 2 is given in (Mor- 
ishima et ah, 2000(b)). 




(VoiVoj G X)ShareBs.i^PAGE.i^TUPLE.\{oi,Oj). 
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5.3. TARGET RELATION 

A target relation represents the target sets and the associations among them. 

Definition Assume that there are n target sets without clues (specified by exam- 
ples ei, . . . Cn), m target sets with clues (specified by examples Cn+i, • • • ,Cn+m and 
clues Cl, . . . , Cm), and I associations among them. Then, the target relation is defined 
as follows. 

TR = { {value{oi ), . . . , value{on+m)) | 
oi G O A Candidate-Predei (oi) 

A. . . 

Aon G O A Candidate- Prede.^^ (on) 

Aon-f-i G O A Candidate- Prede^^i (on+i) A 3ci G 0{Clue-Predd-^ (ci) 

AS'A-Prede,^+i ,cA (on+i ^ ci )) 

A. . . 

Aon+m G O A C andidate-Prede.^_^^^ {on-\-m) A 3cm G 0(Clue-Predci^ {cm) 

A-PTedo^n-S^m^^^rn, , Cm)) 

f\Association-Predi{oai , ) A . . . A Association-Predi{oa, , ))}, 

where Association- Predi is an S- Association predicate or V-Association predicate. As 
explained before, S- Association predicate has the form Sharep(o, o'). V-Association 
predicate has the form o = o' ^ and holds if and only if value{o) = value(o'). S- 
Association predicates are determined by paths of example objects and candidate 
predicates. V-Association predicates are derived from the values of example objects. 
Derivation of the association predicates is given in (Morishima et ah, 2000(b)). 

Example The target relation for the example scenario is shown in Expression A. 
(It shows annotations, too.) The superscript number indicates the operation number 
in Subsection 4.4 to which the predicate corresponds. 



{(value{oi)^ 

. . . ,value{o 8 ))\ oi G O A pi->PLAYERS.l-^?*(e)-^PLAYER.?(PLAYER.l)[oi( Jo/mson)]^^) 

(t) 

Ao2 G O A Pi LOGO.l IMG.l[o2(Vy>] 

A03 G O A P2-^NAME,1[o 3 (Jo/mson)]*'’^^ 

A04 G O A p2-^PROFlLE.l[o4{ProfileJ)]^^^'> 

A05 G O A p3— ^P-NAME.1 [o5 ( Joh-nson)]^®^ 

A 3 ci G 0(p3 — >’AVG. 1 [ci > 0 . 3 ]^®^ A Sharcp^ (05, ci)*-®^) 

Ao6 G O A P 3 -^AVG. 1 [o 6 ( 0 . 304 )](®) 

A07 G O A P4—>BATTER. 1 [07 (Johnson)] 

Ao8 G O Ap 4 -^CONTENT.l[o 8 (L^)] 

AShare-p^ (oi, 02)^^"^^ A Sharsp^ (03, 04)^^^^ A SharCp^ (05, 

ASharep^(o7, A oi = A oi = A oi = 



where 



Pi = TP.l-^PAGE.?(PAGE.l)->TEAM.l, 

P2 = PP.1->PAGE.?(PAGE.1)^PLAYERINF0.1, 
P3 = BS.l-^PAGE.?(PAGE.l)^TUPLE.l, and 
P4 = VD.l-^PAGE.?(PAGE.l)->TUPLE.l. 



Expression A. Specification of the target relation with annotations. 



Figure 21 shows all the target sets and associations involved in the example sce- 
nario. Figure 22 shows the target relation based on the target sets and associations. 
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Figure 22 Target relation. 



5.4. CREATION OF THE NESTED 
STRUCTURE 

Creation of the nested relation reflecting the specifled grouping structure is straight- 
forward. It depends on the position of examples and repetition marks (*) on the 
Canvas. Figure 23 shows the grouping specifled on the Canvas shown in Figure 15. 
The nested relation satisfying the requirement is constructed by applying Projection 
and Nest operators (Fischer, 1983)^. In this example, the following expression creates 
the nested relation shown in Figure 24. 

Nest [- 9 ;^ (tt (T R)) 

U=(L^) Johnson , 0.304 , i^ ,'^, ProfileJ ^ 

Figure 25 shows the result of mapping the nested relation into the Web structure 
consisting of the index HTML page and SMIL pages. Figure 4 corresponds to the 
result SMIL page for Johnson. 
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Figure 23 Grouping specification. 



Figure 24 Result nested relation. 
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Figure 25 The result of data operation. 



6. RELATED WORK 

To the best of our knowledge, the visual user interface proposed in this paper is the 
first one that attains seamless integration of authoring, querying, and restructuring 
of data for multimedia view construction. However, there are several systems which 
support users in specifying presentation of the set-at-a-time data manipulation result. 
Delaunay^^(Cruz et ah, 1998) provides a visual user interface where the user can 
drag and drop graphical icons. The icons are used to represent the multimedia data 
objects that are the result of queries from different information sources. However, the 
user has to enter SQL-like queries in a visual format. In particular, it uses WebSQL 
(Mendelzon et ah, 1996) as the target language for Web queries. RBE (Kishnamurthy 
et al., 1995) also allows users to drag and drop various GUI widgets for the purpose of 
rendering data stored in a database. In RBE, explicit utilization of domain variables is 
required to specify how to connect data in the database to the widgets. Moreover, the 
source data of rendering in RBE is assumed to be well-structured data. (In fact, the 
source data in (Kishnamurthy et ah, 1995) is a single relation.) Tiramisu (Anderson 
et al., 1999) is a web-site management system where presentation of the synthesized 
Web pages can be specified with authoring tools such as FrontPage. (They refer to this 
specification process as implementation.) In Tiramisu, querying and restructuring of 
the underlying data have to be done with the site schema, quite a different operational 
ffamework from that for implementation tools. The basic idea underlying Tiramisu 
is that logical design of a web-site should be independent of its presentation. But we 
believe that the idea does not necessarily imply that the operational flameworks for 
logical design and presentation design have to be different. SuperSQL (Toyama, 1998) 
and SQL+D (Baral, 1998) are query languages which integrate display specification 
into SQL queries. They provide no means to represent queries visually. 

There are a number of authoring tools for HTML and SMIL pages. Basic design 
of our visual user interface has been influenced by DreamWeaver (Macromedia, Inc), 
where yarious data objects can be arranged by drag-and-drop operations. FrontPage 
(Microsoft Corp.) allows users to incorporate SQL queries into special documents 
named ASP. But it has no concept of “Example” for the set-at-a-time manipulation. 
Also, there are some SMIL authoring tools such as RealProducer G2 Authoring Tool 
(RealNetworks, Inc.). As far as we know, there is no SMIL authoring tool which 
provides querying facilities. 

The concept of “Example” was first introduced in QBE (Zloof, 1977). QBE was de- 
signed for relational databases where data has flat structure. Construction of nesting 
structures is supported by languages such as STBE (Ozsoyoglu, 1989) and RBE (Kish- 
namurthy et al., 1995). They all assume that data has explicit and regular structure, 
which is not guaranteed in the context of semistructured data. Recently, several visual 
query languages for semistructured data have been proposed. DataGuide (Goldman 
et al., 1997) gives an abstract specification of semistructured data. It can be used 
as a query language with examples. HQBE (Kitagawa et al., 2000)(Morishima et al., 
1999(a)) can construct various views over the Web, RDBs, and structured documents. 
Also, there are a number of graphical query languages for various information systems 
(Chavda et al., 1997) (Kuntz et al., 1989), some of which support query formulation 
with drag-and-drop. All these languages request the user to specify queries according 
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to some metadata such as database schema. In contrast to this, our framework infers 
queries from instance-based example operations. In this sense, our approach is similar 
to the query by example approach in the information retrieval context (Flickner et 
ah, 1995) (Ishikawa et ah, 1998). The difference is that our flamework has to infer 
the intended data operation rather than distance-based (similarity-based) queries. 

7. CONCLUSION 

In this paper, we have proposed a visual user interface which amalgamates author- 
ing, querying, and restructuring functions in construction of multimedia Web views. 
By introducing the concept of designating existing data objects as examples into the 
drag-and-drop-based operational framework, the interface allows the user to seam- 
lessly integrate object-at-a-time and set-at-a-time operations. The proposed scheme 
can also cope with semistructured data which often appears in the context of mul- 
timedia Web view construction. The interface allows the binding of examples to be 
decided dynamically according to the user’s interaction. We have provided the formal 
semantics of the data operation through the interface. 

A prototype system to implement the proposed scheme is under construction. 
Experimental evaluation of the interface is an important future research issue. We 
believe that in the context of Info Weaver the interface is far more user friendly than 
the existing interface, but we have to verify this by quantitative analyses. Also, it 
is important to clarify rules which meet the user’s intention in practical situations. 
Another important research issue is analysis of the expressive power of the framework 
based on the formal semantics. These issues will be discussed in forthcoming papers. 
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Notes 

1. The <g/> is called an empty- element tag. It is semantically equivalent to <gx/g>. 

2. value(e) is used to derive V-Association. We explain details in (Morishima et al., 
2000(b)). 

3. The annotations are used only in the derivation process. They are removed after the 
derivation process is completed. 

4. In query languages for semistructured data such as Lorel (Abiteboul et ah, 1997), path 
variables are used to represent this association. For example, A.l— >B.?— »C[oi] A A.l— >B.? 
— >D[o2 > S]AShareA.i-^B.?{oi , 02 ) would be translated into A.l— >-B.?{i?i} — >-C[oi]A A.l— >-B.? 
{R 2 } — ^D[o 2 > 3]Ai?i = R 2 where R± and R 2 are path variables. However, in this paper, we 
adopt the former style in order to clearly separate the predicates. 

5. If multiple repetition marks appear at the same hierarchy level, the order of applying 
Nest operators must be specified by the user, since Nest operators are not commutative. 
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Abstract: 



In this paper we present BBQ (blended browsing and 
Querying), a graphic user interface for seamlessly 
browsing and querying XML data sources. BBQ 
displays the structure of multiple data sources using a 
paradigm that resembles drilling-down in Windows’ 
directory structures. BBQ allows queries incorporating 
one or more of the sources. Queries are constructed in 
a query-by-example (QBE) manner, where DTDs play 
the role of schema. The queries are arbitrary 
conjunctive queries with GROUPBY, and their results 
can be subsequently used and refined. To support 
query refinement, BBQ introduces virtual result 
views: standalone virtual data sources that (i) are 
constructed by user queries, from elements in other 
data sources, and (ii) can be used in subsequent 
queries as first-class data sources themselves. 
Furthermore, BBQ allows users to query data sources 
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with loose or incomplete schema, and can augment 
such schema with a DTD inference mechanism. 

1. INTRODUCTION 

As the World-Wide Web comes to be viewed as a large semistructured database 
with XML [XML98] as its model, issues related to querying semistructured data in 
general, and XML in particular, become more important. Unlike relational or 
object-oriented databases, semistructured data sources have content structure that 
is too irregular to easily map to a rigid schema. Recent research in this area has 
focused on query languages [BDS95, BDH+96, AQM+97], semistructured data 
extraction [HGC+97] and systems for integrating heterogeneous sources 
[CMH+94, MAG+97, BLP+98]. Research has largely ignored the issue of user 
interfaces for browsing and querying semistructured data; we believe this issue 
will be of great importance in the near future as XML becomes more widely by the 
general public. 

The MIX (Mediation of /nformation using XML) project [BLP+98] exports to 
clients integrated views of autonomous XML data sources, using a well-known 
mediator architecture [Wie92]. The focus of this paper is the MIX client - a 
graphical user interface called BBQ (Blended Browsing and Querying). BBQ 
facilitates intuitive querying and browsing of one or more XML data sources, 
seamless iterative query refinement, and structure discovery. Data sources are 
displayed in a multi-document interface (MDI) layout- each source is assigned a 
window, with its data and schema shown side-by-side within the frame. Both the 
data and the schema are displayed as directory-like tree structures, which users can 
navigate and place conditions on. 

BBQ combines the search paradigm used in relational databases with Web 
searching. As with relational database searches, querying in BBQ is schema-driven 
(using XML DTDs); but like web searches we assume that the user is not in 
advance sure of what is the focused query result he is interested in. So the system 
emphasizes navigation of schemas and data and the query results can be iteratively 
refined. 

We support query refinement by having query results be sources used in 
subsequent queries. Users can construct a query result document (essentially a 
virtual view) and that document becomes a first-class data source within BBQ, 
meaning it can be browsed, queried, or used to construct another query result 
document. We facilitate the perception of a query result as a source by 
automatically inferring its DTD from the query statement and the source DTDs 
using an algorithm similar to MIX’s view inference module [PV99]. Furthermore, 
once the data for the query result are obtained, the structure discovery module 
figures out the structure of elements that were loosely specified (using the 
keyword ANY) in the source DTDs and displays it accordingly. 

BBQ is built using the MIX mediator’s DOM client API, which provides an 
efficient platform for virtual views. In particular, the MIX mediator’s lazy 
implementation of DOM, called DOM-VXD [LPV99], does not actually retrieve 
source objects until they are absolutely necessary. Hence it is an ideal framework 
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for browsing large query results where the user may eventually browse only a 
small part of the query result and then he refines the query. 

The intended audience of the BBQ interface is persons who understand the 
nested structure of XML data and have both the need and the expertise to use 
database query language concepts such as joins and aggregates. A typical user 
would be a webmaster or a database administrator/programmer. However, BBQ 
has a programming interface that allows for the creation of simpler user interfaces, 
if one would be so motivated as to design one for less sophisticated users. This is 
discussed further in Section 4.3. 



1.1. Related Work 



There are very few visual interfaces for querying and browsing semistructured 
data, and fewer still for XML. The bulk of the effort on semistructured data is 
focused on query language design, processing, schema, view inference, etc.; 
interfaces appear to be tacked on merely as test drivers. It is therefore no surprise 
that BBQ was most strongly influenced by a visual interface designed not for 
semistructured data, but for object databases- IBM’s PESTO [CHM+96]. 
PESTO’s design introduced the notion of fusing query formulation and result 
browsing into one seamless paradigm, termed Query-In-Place. However, 
PESTO’s use of temporary synchronizers for viewing joins across multiple 
databases breaks down the seamlessness; users can browse tuples in the 
synchronizer, but cannot built queries on top of them. Also, as an object database 
interface, PESTO is not equipped to display irregular schema common to 
semistructured data. [GGK99] is similar to BBQ in that schema is presented in a 
graphical manner, and a special mechanism (a “copy and drop” feature) allows 
users to use sample data from results as constraints in subsequent queries. 

Visual interfaces that can display irregular schema, such as those found in 
EquiX [CKK+99] and Lore’s DataGuides [GW97, GW98], do not support queries 
across multiple DTDs. Further, in DataGuides the tree structures do not accurately 
render the data’s structure; conjunction, disjunction, and repetition (‘?’, etc) 
are not shown. Omitting structural clues can lead users to create unsatisfiable 
queries, for example if the user asks for a sequence consisting of conditionl(a) and 
condition2(a), when in reality only one a exists (a is not adorned with or '+'). 
One system that accurately displays DTDs (wildcards and all) is IBM Alphaworks’ 
VisualDTD [IBM99], although VisualDTD functions as a DTD composition tool 
only. The Araneus project [MAM-h98] allows for querying of HTML pages using 
POLYPHEMUS, a visual interface that shows connected web pages as a large 
diagram. 



1.2. Paper Outline 
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Section 2 describes the basic look and feel of BBQ. Section 3 covers the BBQ 
query cycle, and issues related to query formulation and refinement: XMAS 
[LPV+99] query construction, creation of virtual views, and incremental 
refinement of DTDs. Section 4 covers BBQ “accessories" and the application 
programming interface. Section 5 presents BBQ’s current implementation status 
and discusses future work. 



2. THE BBQ INTERFACE 



In this section we introduce the reader to BBQ’s representation of XML data, and 
its uniform interface for structural navigation of both data and schema, first, we 
describe some sample DTDs that will be used in examples throughout the paper. 



Figure 1. A set of base DTDs 

(a) 

< IDOCTYPE 

CSEStudents [ 

<!ELEMENT CSEStudents {CSEStudent ) * > 

<! ELEMENT CSEStudent (name, advisor?, 
degree) > 

< I ELEMENT name (# PCDATA ) > 

<!ELEMENT advisor (# PCDATA ) > 

<1 ELEMENT degree (# PCDATA ) > 

]> 

(b) 

< IDOCTYPE Interns [ 

< I ELEMENT Interns (Intern)* > 

<! ELEMENT Intern (name, supervisor, 
sponsor) > 

<! ELEMENT name (# PCDATA) > 

<1 ELEMENT supervisor (#PCDATA) > 

<! ELEMENT sponsor (CAIDA | NPACI | 
UCSD) > 

<! ELEMENT CAIDA (ANY) > 

< I ELEMENT NPACI (ANY) > 

<! ELEMENT UCSD (ANY) > 

] > 



< IDOCTYPE ResHalls [ 

<!ELEMENT ResHalls (ResHall | RitzyResHall )+> 

< ELEMENT ResHall (hallName, apartments)> 
<!ELEMENT RitzyResHall (hallName, apartments )> 

< [ELEMENT hallName (#PCDATA)> 

< [ELEMENT apartments (apartment)+> 

< [ELEMENT apartment (floor, number, tenant)> 

< [ELEMENT floor (#PCDATA)> 

< [ELEMENT number (#PCDATA)> 

< [ELEMENT tenant (#PCDATA)> 

]> 
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2.1 Base DTDs 



BBQ is XML-driven, i.e. schema and data instances are represented by XML 
DTDs and XML documents respectively. Figure 1 shows a set of DTDs exported 
by the MIX mediator. We refer to these DTDs as base DTDs. A base DTD may be 
the integrated view of multiple data sources, but BBQ need not be aware of this; 
BBQ thinks of each DTD as emanating from a single XML data source. 

Let us assume that the CSEStudents DTD source shown in Figure 1(a) is a 
database of graduate students in UCSD’s Computer Science program, Interns in 
Figure 1(b) is a database of summer interns at SDSC, and ResHalls in Figure 
1(c) is a database of UCSD’s Residence Halls tenants. The CSEStudents 
relation contains zero or more CSEStudent elements. Each CSEStudent has a 
name, a degree, and perhaps an advisor. The Interns relation contains 
zero or more Intern elements. Each Intern has a name, a supervisor, and 
a sponsor. Note that the intern’s sponsors, CAIDA, NPACI and UCSD, are 
described with the special XML token ’ANY’, meaning they can be any XML 
subtree. The ResHalls relation contains one or more of either ResHall or 
RitzyResHall elements. Each ResHall/RitzyResHall contains a name, 
address, and list of one or more apartment elements. The apartment 
consists of the floor number (floor), apartment number (number) and the 
tenant’s name (tenant). Finally, each of these data sources has an associated set 
of XML document instances. 



2.2 The Interface 
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Figure 2: The BBQ Interface 



The BBQ interface consists of one main window and zero or more floating 
windows. The main window contains a toolbar, a split pane, and a message 
console, while the floating windows contain a toolbar and split pane only. BBQ is 
a standalone Java application; at the time of this writing, a well-known Java Swing 
bug prevents BBQ from functioning properly as an applet^. Eventually, BBQ will 
be available as both. 

We begin our example, by selecting and browsing two DTDs exported by the 
mediator (Figure 2). This is the beginning of a query session and the beginning of 
the first query cycle; Section 3 discusses query cycles in detail. The DTDs are 
represented as trees in the obvious hierarchical manner: an element name is a 
parent node, and that element’s sub-elements (defined in its contentspec) are 
its children. BBQ features special tree nodes to represent XML DTD's structural 
operators such as the choice and the seg(uence)[XML98]. For example. 
Figure 2 shows the CSEStudent element with one child node labeled SEQ, with 
CSEStudent's contentspec as SEQ’s children, and advisor labeled with a 
question mark (?). This is the visual equivalent of : 

<!ELEMENT CSEStudent (name, advisor?, degree) >. 

These special tree nodes give the user a more accurate view of the DTD's structure 
than other semistructured-data viewing systems, and they also facilitate more 
complex queries. For example, a default order constraint is introduced, namely the 
one that corresponds to the order in which elements are listed on the screen. The 
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users can create queries where sibling order is irrelevant, by right-clicking on the 
SEQ node and de-selecting "Constrain Order". In the display SEQ becomes AND, 
and internally the order constraint will not be applied when the query is generated. 
Figure 2. illustrates that BBQ can simultaneously display multiple DTDs. This 
feature may seem simple and obvious, yet it is powerful and surprisingly unique 
among semistructured-data display tools we have came across. Its power comes 
from the type of queries it allows- joins and functions across databases. In the 
main BBQ window, at most one DTD can be displayed, in the left panel. Each 
subsequently requested DTD is displayed in a floating window, and has a unique 
string appended to its name in the event that the same DTD is opened in multiple 
windows. The DTDs can be browsed and queried independently, or combined in 
queries. 

At the top left of the main BBQ window are BACK and FORWARD arrows. 
These buttons let the user go back to previous query cycles. To the right of the 
FORWARD arrow is the VIEW NEW DATA SOURCE button, which brings up 
the list of exported DTDs. Next is the VIEW/EDIT JOINS button, which shows a 
dialog box containing an editable list of joins set so far in this query cycle. To the 
right of the VIEW/EDIT JOINS button is the VIEW SOURCE button, which 
brings up a dialog box containing a text version of the main window’s DTD. 
Finally, to the right is the EXECUTE QUERY button, which triggers BBQ to 
complete the query cycle by (among other tasks) composing an XMAS query, 
sending it to the MIX mediator, and displaying the results. 

The right-hand panel of the main BBQ window is where XMAS queries with 
conditions on multiple DTDs is constructed. The node labeled ‘Construct Head 
Here’ is the “root” upon which the query result document is built. Section 3.2 
describes the role of this panel in the query construction process. 



2.3 Browsing DTDs And XML 



BBQ displays each requested data source in a tabbed pane set- one tabbed pane 
shows the DTD, the other shows a set of corresponding XML instance data, 
represented as a directory tree (Figure 3). We mentioned earlier that query result 
instances are materialized on demand via DOM-VXD. It turns out that base 
DTD instances are retrieved from the mediator using the same mechanism. The 
XML data is materialized on demand from the source, in customizable 
increments. The buttons labeled next and previous in the XML panel retrieve the 
next and previous n instances, respectively. We describe an instance as a child 
subtree of the XML document root element. As is obvious from the figures the 
presentation and browsing of both schema and data is done using the well- 
known paradigm of navigating into a directory structure. 




284 VISUAL DATABASE SYSTEMS 







’Vtow^'i^li 0 Close '}> 

~.K. V. V. ” ‘^:E. I ■ 



start Here 

9 CSEStudents 
9 I^CSEStudenr 
9 S^SEQ 
9 name 

^ #PCDATA 
9 Op advisor? 

^ e?#PCDATA 
9 ^degree 

^ gp #PCDATA 



(a) 




Afliew^, 



13 «CSESt(JaentS!* 

®- Gil <CSEStudent> 

©•Gil <CSEStudent> 

9 03 <CSEStudent> 

9 C3 <name» 

Q Ricardo Montablan 
9 E3<advisor> 

Q Ed Anderson 
9 C3<degree> 

D PhD 

9 E3 <CSEStudent> 

9 EhI <name> 

D Pablo Nguyen 
9 03 "!degree> 

Dms 

©-oil <CSE8tudent> 



Figure 3: CSEStudents DTD (a), and corresponding XML document (b) 



3. A BBQ QUERY CYCLE 



In this section we describe BBQ’s intuitive query formulation, its XMAS query 
generation, and its unique query refinement capabilities. We first define query 
session and query cycle, then step through an example, highlighting BBQ features 
along the way. 

A query session is the set of events that occur while BBQ is connected to the 
mediator. Each query session consists of one or more query cycles. A query cycle 
is the set of events that starts with the user constructing a query, and ends with the 
user browsing the query result. The basic BBQ query cycle takes place in four steps. 
First, constraints are set on the data sources. Second, a tree representing the query 
result schema is created by dragging and dropping elements. Third, the XMAS 
query is generated and submitted to the mediator. Fourth, a DTD is generated for the 
query result and the query result schema and data are displayed. (Recall, generating 
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a schema for the query result is done using just the query statement and the source 
DTDs, as described in [PV99]).. XMAS queries are of the form 

CONSTRUCT head WHERE body 



The body describes the constraints to be set on data to be extracted, and the head 
describes the data to be displayed. From the above discussion it should be clear that 
step one of the query cycle corresponds to constructing the XMAS body, and step 
two to the head. The following two sections show by example how to set constraints 
and construct the head respectively. 
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Figure 4. Setting constraint on the contentspec of degree 



3.1. Setting Constraints, Joins, and Filtering 



Constraints can be set on the leaf nodes of the DTD tree or XML tree. Constraints 
cannot be set on nonleaf nodes (contentspec that are not terminals like 
# PCDATA or EMPTY) only because XMAS does not support element constraints 
that are more complex than the default existential one. The operators are a basic set 
of comparators ’substr’); in the future we anticipate mediators 

exporting a list of special operators they support. 

As an example of constraint setting. Figure 4 shows the user restricting 
CSEStudents to doctoral-track students only. The user right-clicks the degree 
element and selects "View/Edit Constraint..." from the popup menu. This action 
brings up the "View/Edit Constraint" dialog box, where “=” is selected as the 
operator, and “PhD” is typed in as the operand. At this point, the user clicks “OK”, 
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and degree is visibly marked with the constraint (not shown). No actual data 
manipulation takes place until the query request is sent to the mediator and a result is 
returned. 

Joins can take place within a data source or across data sources. Creating a join 
in BBQ is as simple as selecting one leaf element, and dragging and dropping it onto 
another leaf element. Continuing our example, suppose the user is interested in 
doctoral-track CSEStudents who are also interns, and whose advisor is also their 
internship supervisor. Figure 5 shows a join across CSEStudents and Interns, 
where CSEStudents . advisor is being bound to Interns . supervisor. 
CSEStudents . name is bound to Interns . name in the same manner. 

Taking a page out of PESTO’s playbook, BBQ provides a mechanism similar to 
Query-In-Place, which we call filtering. Filtering implicitly performs the last three 
steps of the query cycle; the filter is immediately evaluated and results are made 
available without explicit construction of the head. In particular, filtering takes place 
as follows. While browsing the XML or DTD tabbed pane, the user may decide to 
constrain some elements of a collection, and view only those elements that satisfy 
the constraint. The constraint is set just as shown in Figure 4, but after dismissing 
the constraints dialog with “OK”, this time the user hits the EXECUTE QUERY 
button. The query result DTD is the DTD of the constrained selected elements and 
is computed by the view inference module [PV99] that is able to change the source 
DTD structure in order to account for the "refinement” induced by the conditions. 
For example, if the user places a condition on an optional subelement s of the 
selected element x the query result DTD indicates that now 5 is actually a required 
field of X (since if 5- were absent x would not qualify for the answer.) The XML 
tabbed pane now shows only XML data that satisfy the constraint. In general, if the 
head panel is empty and the user selects EXECUTE QUERY, a filter is executed on 
the body of the panel where the button was pressed. 
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Figure 5. Setting a join across data sources. After supervisor’s contentspec is dragged and 
dropped onto advisor’s contentspec, both are marked with ajoin number {JOINO}. Likewise 
for Intem.name and CSEStudents.name 



3.2 Constructing The Head 
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Figure 6. Constructing the head with elements from both sources, using drag-and-drop 
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Once the constraints are set, the next step in the query cycle is to construct a tree 
that the answer document(s) must conform to, called the head or query result tree. 
The right panel of BBQ’s main window is where the head is built. The head is 
composed of elements (and their sub-trees) dragged from source DTDs, and tags 
created on the spot with the “Create New Child” popup menu item. The query result 
tree must be a tree, and BBQ enforces this by allowing only a single root element. 
Figure 6 shows a query result tree that contains a newly-made root element 
CSEInterns, a newly-made child CSEIntern, and three elements- name 
dragged from CSEStudents, supervisor and sponsor dragged from 
Interns. 

The “{GRP-h}” string marking the CSEIntern element in Figure 7(b) indicates 
an XMAS GROUP-BY label A GROUP-B Y label is roughly equivalent to GROUP- 
BY semantics of OQL; in our example, it indicates that there will be one 
CSEIntern element for each unique Interns .name element. To set GROUP- 
BY labels, the user right-clicks on an element in the head, and selects the “Group 
ByiOther...” popup menu item. A dialog appears, and until the dialog is dismissed, 
any element the user selects is added to the list of GROUP-BY elements. Figure 7(a) 
shows Interns . name being selected as a group-by element for CSEIntern. 



3.3 Executing the Query 



The query we have constructed is now equivalent to '‘SELECT CSEStudent.name, 
Interns. supervisor, Interns. sponsor WHERE CSEStudent. degree = 'Ph.D' and 
CSEStudent.advisor = Interns. supervisor; Group by CSEStudents. name, within the 
element CSEIntern'l (This query notation is given just for understanding purposes. 
Figure 8 shows the actual query.) Once the user clicks the “Execute Query” button, 
we enter the third step of the query cycle. BBQ converts the visual layout into 
XMAS query language, contacts the MIX mediator and submits the query. Figure 8 
shows the generated XMAS query. We do not get into the details of XMAS syntax 
and semantics. We merely note that the WHERE clause follows XML-QL's [XML- 
QL] syntax and semantics while the SELECT clause has replaced XML-QL's 
grouping by Skolem-id’s with the more conventional grouping using group-by. 
Nevertheless, observe that this relatively complex query statement was created with 
a few simple GUI actions, and with no knowledge of the underlying query language. 
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Figure 8. BBQ generates this XMAS query corresponding to GUI object layout 



The conversion step from GUI layout to XMAS query can be straightforward or 
quite complex. Constructing an XMAS query such as Figure 8 is relatively simple: 
assign variables to constrained elements and head elements, state head element 
variables in the CONSTRUCT clause, state the tree paths that lead to those elements 
in the WHERE clause, and finally state all constraints at the end of the WHERE 
clause. A more complex case is where a subtree is dragged to the head, and a node 
of that subtree is deleted. In this situation the tree path of the deleted node, and all 
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the the siblings of the deleted node must be explicitly enumerated in the WHERE 
clause so that the query can manage to keep everything but the deleted element. 



4. MISCELLANEOUS ACCESSORIES 



4.1 Vertical Wildcards 



BBQ offers a unique feature for visually querying elements that can be reached by 
multiple paths: the vertical wildcard. It allows users to target elements without 
specifying their exact location in the DTD hierarchy. Suppose while browsing 
Figure 1(c), our user becomes interested in tenants living on the floor, regardless 
of living arrangements (ResHall or RitzyResHall). BBQ displays apartment 
as a subtree of both ResHall and Rit zyResHall; the naive solution would be to 
find all instances of apartment, and for each instance set the floor condition, and 
drag tenant to the head. For larger DTDs it may be too time-consuming to find all 
instances of an element, so BBQ offers the following solution: the user selects the 
ResHalls element, then selects “Vertical Wildcard” from the popup menu. BBQ 
attaches a node labeled “ANY” to ResHalls; the user then drag and drops one of 
the apartment elements onto the “ANY” node. Finally, the user sets the 3^^ floor 
condition on apartment. floor, and drags apartment. tenant to the head. Figure 9(a) 
shows the construction of the query, and Figure 9(b) shows the resulting XMAS 
query 
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Figure 9(b). The resulting XMAS query 

In general, users can drill down to any element in a DTD, attach the “ANY” 
node, then drag and drop any subtree from that DTD onto the “ANY” node. Once 
users set constraints on elements in the subtree, the vertical wildcard appears in the 
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XMAS query; otherwise it is ignored. The vertical wildcard is a powerful construct 
that we have not observed in any other visual querying system for semistructured 
data to date. 



4.2. Query History 



BBQ allows the user to explore the last n query cycle transactions, where n is a. 
configurable positive number. Each click of the BACK button takes the BBQ state 
to the end of the previous query cycle, just before EXECUTE QUERY or “Query” 
was clicked. All constraints set at that point are re-bound to their respective 
elements, so the user can remember how an answer document was obtained. 



4.3 The BBQ Application Programming Interface 



In some query domains, BBQ’s current interface may not be the most suitable 
display paradigm; for example an XML-based Geographic Information System 
(GIS) database may prefer to display its data as a clickable map. Therefore, to make 
BBQ applicable in different environments, we have decoupled the user interface 
from the mechanisms which manage the data structures and query sessions. The 
underlying mechanisms have been bundled together in a package with a simple 
programming interface, called the BBQ Application Programming Interface (BBQ 
API)^ BBQ API aims to abstract away as much of BBQ’s inner workings as 
possible, while allowing users to safely manipulate some of the core data structures. 
Specifically, BBQ API gives client applications read-only access to tree 
representations of all base DTDs, source DTDs, answer trees, and all their 
corresponding XML documents. The XML trees are represented by Java 2’s 
Def aultMutableTreeNode class, while the DTD trees are represented by our 
own BBQTreeNode class whose methods mirror Def aultMutableTreeNode. 
By exposing the document data structures, we give client applications a great 
amount of flexibility; they are free to navigate about the trees and display their 
contents in any fashion. Also, by keeping the data structures read-only, we simplify 
our own internal book-keeping while preventing users from misaligning their view 
of the data source with the “real” view maintained by the mediator. 

The core of BBQ API is in the BBQSession class. Within BBQSession are 
all the calls necessary to perform a query session; a typical client application 
instantiates the class as the first step. From there, the application connects to a 
mediator specified by a URI, and receives a list of base DTDs that the mediator has 
available. The application opens an available base DTD with one call, and receives 
a set of three items: 1) a tree data structure representing the source’s XML content, 
2) a tree data structure representing the source’s DTD, and 3) an internal 
identification value for future transactions. Next, the application constructs an 
XMAS query using BBQSession calls and the given data structures. Finally, the 
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application calls executeXMASQueryO, and is returned a new set of three items, 
except those items now represent the query result. 

As a proof of concept, the BBQ interface discussed in this paper is in fact a client 
of the BBQ API. Currently, we are laying the groundwork for BBQ API to be used 
in the engine for a query tool whose domain is the Protein Data Bank. From this 
proposed project we expect valuable feedback regarding the API’s flexibility, 
completeness and simplicity in different query domains. 



5. CURRENT STATUS AND FUTURE WORK 



Almost all of BBQ’s interface has been completed. We are currently working on the 
Query History modules. There are plans to create an XML browser for a more user- 
friendly display of XML instances; efforts in that direction would hinge upon a more 
finalized XSL (or equivalent standard) specification. Also, some policy needs to be 
devised regarding BBQ updating stale persistent virtual views when their constituent 
elements change or disappear. 
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Abstract 

We present a meta-data driven query language (MDDQL) as a visual query 
language. The query language has been specified in such a way that interpre- 
tations of data values, attributes and schemas are explicitly taken into account 
during query construction. These can be expressed directly in the structure of 
MDDQL terms which are represented as objects within a meta-data or ontologi- 
cal database. They are mainly classified into two categories: domain of interest 
and operations. In order to alleviate the task of query construction when large 
and, somehow, difficult to understand conceptual schemas and value domains are 
considered, the construction of the intended queries is done incrementally on a 
Web-based blackboard used as a visual query interface through system guidance. 
Therefore, query construction becomes a matter of moving amongst consistent 
query states. Inferences about a potential query consistent state to move onto are 
made by the query language interpreter which implements a kind of state automa- 
ton. The inferences are based on the background knowledge, as represented in 
terms of MDDQL term objects and their connectionism issues forming a cyclic 
directed graph, and the context of the current query state. Each consistent query 
state includes a subset of MDDQL terms as inferred by the language interpreter. 
These might refer to complex or more abstract terms given the recursively de- 
fined structures of MDDQL terms. This is particularly useful when large schemas 
and/or well-restricted value domains are addressed. 

Keywords: Meta-data, Knowledge representation. Finite State Automata, Visual Query Lan- 

guages and Systems 

1. INTRODUCTION 

Querying a database managed by a Data Base Management System (DBMS) 
is usually done by special purpose languages, called query languages, which 
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are defined as mappings from a Universe of Discourse into a subset of it, which 
is expressed by the query result. However, the most widely used database 
query languages require knowledge about language syntax, information and 
full understanding of the application domain. All these query languages do not 
directly address the meaning of the data, when a query is being constructed. 
This is particularly crucial when large or complex database schemas are queried 
by end-users which are no experts in a specific application domain, or they are 
not willing to understand the underlying data model and/or the interpretation 
of data values as stored in a DBMS. 

Consider, for example, two application paradigms: a) di Regional Avalanche 
Information and Forecasting System - RAIFoS which is concerned with the 
collection and querying of a large number (ca. 60-80) of physical parameters, 
b) a Mines Information System - MinesIS which is concerned with more than 200 
attributes organised within a data model referring to dissimenation of various 
kinds of explosive devices over regions involved in war. 

In both cases, besides the difficulties of coping with the semantics of a 
database model or schema, addressing particular values for conditional state- 
ments of queries requires a thorough knowledge of well-restricted value do- 
mains, such as {southern slopes, eastern slopes, northern slopes, unknown 
exposition, no avalanche, in various or all expositions, wind protected slope} 
referring to exposition of avalanche as an attribute. Additionally, they are all 
encoded in the storage model such as {0,1, 2,3,...}, respectively. In the fol- 
lowing, we refer to all these symbols - including symbols for database schema 
elements such as attributes or relations/classes - standing for something else as 
implementation symbols. 

Despite the fact that visual query systems and/or languages are a step towards 
an end-user friendly way of querying a database in that query language syntax 
is avoided, the embedding of meaning and/or interpretation of data during 
query construction is not considered. Thus formulation of a query requires 
interpretation not only of the data model, but also of the values themselves 
to be considered for conditional statements. Moreover, query construction 
through graphical representation of entire conceptual models turns out to be 
overwhelming, especially, when large conceptual schemas are considered. In 
addition, no reference to well-restricted value domains is made at the level of 
conceptual schemas. 

In order to alleviate the task of formulating a semantically well-defined 
query without knowledge about language syntax and with no need of database 
schema and values interpretation, we elaborated a Web-based Visual Query 
System, which provides user interaction based on a Meta-Data Driven Query 
Language - MDDQL with a graph-based visual formalism. The alphabet of 
MDDQL consists of two main subsets: a) Domain of Interest terms, b) terms 
standing for operations. Domain of Interest terms refer not only to conceptual 
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schema elements but also to values as elements of sets of well-restricted value 
domains. 

Since the terms of MDDQL are a means of incorporating interpretation of 
implementation symbols, they are conceived as objects having attributes which 
refer to natural language words and/or annotated descriptions. Therefore, 
query construction is done on the basis of a semantic triangle as given by the 
triple <term, word, symbol>. However, in order to minimise the query 
construction effort, the incremental query construction process takes place in 
terms of system suggestions which are inferred on the basis of the current query 
context and the background knowledge as represented by the structure of the 
terms as well as their connectionism. 

Given that preconditions are also encountered for well-restricted value do- 
mains, semantic constraints also apply to the consideration of conditional values 
of the intended query. Similarly, operations are assigned to query constructing 
elements on the basis of semantic constraints. 

The construction of the query takes place on a web-based blackboard as a 
visual query interface of MDDQL and relies on a WIMP (Window, Icon, Menu, 
Pointer) user interface. Windows are used as containers for the presentation 
of a set of terms to be suggested to the user due to the current state of query 
construction. Icons might accompany the visual representation of terms, such 
as images referring to particular terms from the set of data values or symbols 
for operations. A set of suggested terms is presented to the end user when 
a particular query term is activated. Entry points are any initial terms which 
address main concepts as a minimal subset of all potential concepts. 

Background and related work. The importance of avoiding an under- 
lying query language formalism when end-users need to pose queries to a 
database system has received much attention during the last 10-15 years within 
the database research community and many Visual Query Systems (VQS) 
and/or Visual Query Languages (VQL) have been developed to alleviate 
the end-users’ tasks. A survey of these approaches is given in (Catarci et al., 
1997). 

VQSs can be seen as an evolution of query languages adopted in database 
management systems in order to improve the effectiveness of Human-Computer 
Interaction (HCI). Thus, their most important features are those that determine 
the nature of the human-computer dialogue, in order to maximise user task 
performance (Chan et al., 1998). They mainly rely on the integration of the 
data model and query language in a user-database interface (Chan, 1997) as 
well as presentation and interaction components that together form a graphical 
user interface (Murray et al., 1998a). 

In particular, query languages based on visual formalisms have been pro- 
posed (Cardiff et al., 1997; Florescu et at., 1996; Papantonakis and King, 
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1995; Merz and King, 1994; Meoevoli et al., 1994; Clark and Wu, 1994; Haw 
et al., 1994; Berztiss, 1993; Ozsoyoglou and Wang, 1993; Siau et al., 1992) in 
order to tackle the problem of query construction. In all these approaches, the 
conceptual or implementation model is mainly integrated into the query paradig- 
m. However, query construction takes place in terms of navigation through a 
graphical representation of the entire model and without any consideration of 
semantic constraints such as posed by well-restricted domain values and/or by 
their interpretation, mutually exclusive properties and/or values, etc. 

This is also not the case for graph based visual formalisms for both Object- 
Oriented DBMSs where traversal paths can be expressed as queries (Yu and 
Meng, 1998; Chavda and Wood, 1997), and for a Web query system (Li and 
Shim, 1998), where a visual user interface - WebIFQ (Web In-Frame-Query) - is 
used to assist users in specifying queries and visualising query criteria including 
document meta-data, structures, and linkage information. 

Traversal like approaches using graph queries or high level concepts also 
underly the development of query interfaces for large clinical databases (Liepins 
et al., 1998; Taira et al., 1996; Hripcsak et al., 1996; Banhart and Klaeren, 
1995), or for general purpose systems (Chu et al., 1993; Doan et al., 1995; Chu 
et al., 1996; Murray et al., 1998a; Murray et al., 1998b; Gil et al., 1999; Zhang 
et al., 1999). The query generators mostly use an object-oriented data model 
or functional models such as in case of (Gil et al., 1999). 

Despite the fact that in all these approaches query construction is done by 
navigational issues where the end user does not need to learn a particular query 
language, it is very often hard to operate on complex or large diagrammatic 
representations, especially when large database schemas must be considered. 
Furthermore, the query elements are not inferred on the basis of semantic con- 
straints which hold amongst attributes, well-restricted value domains and/or 
operations, which might lead to semantically incorrect queries. 

Even in cases such as (Gil et al., 1999; Zhang et al., 1999) where data values 
are considered for the incremental formulation of the final query, these values 
are not addressed within well-restricted value domains and cannot be subject of 
a semantically meaningful consideration of values for conditional statements 
given the current query context. Without such a meaningful consideration of 
values, queries might be constructed the results of which are, semantically 
speaking, not worth addressing and might lead to expensive operations without 
considerable results. In (Gil et al., 1999), where values can be taken from 
intermediate results during query construction and further refine the final query, 
we are still faced with the problem of addressing values from pre-calculated 
results which, in case of large databases, might exceed several hundreds of rows 
or tuples. 
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Organisation of the paper. In this paper, we focus on MDDQL as a Vi sual 

Query Language at the core of a Web-based Query Answering System. Sec- 
tion 2 gives an overview of MDDQL in terms of its constituting components, 
the meta-data database as a repository of MDDQL terms represented as objects, 
the interpreter realising a kind of state automaton underlying the system guided 
query construction, and the visual query interface and formalism as a front-end. 
Section 3 covers the user interaction strategy for the query construction pro- 
cess. In particular, we focus on a scenario referring to the query construction 
paradigm due to the scientific application domain of RAIFoS. Finally, some 
formal aspects underlying MDDQL are presented in section 4. 

2. LANGUAGE COMPONENTS AND OVERVIEW 

Figure 1 depicts a general overview of MDDQL as a Web-based, Visual 
Query Language. It consists of three components: (a) a Visual Query Interface, 
(b) an Interpreter realising a kind of finite state automaton, and (c) a meta- 
data database, as a repository of MDDQL terms. Since all components are 
implemented in Java and OMS Java - a Java based realisation of the OM model, 
an Applet-bdiSQd version of MDDQL is also available and capable of running 
within a Web Browser. 

As part of a Web-based Query Answering System relying upon a three-tier 
architecture, all components can be down loaded to the client from an applica- 
tion server at a middle layer where the meta-data database resides. Thus the 
query construction process takes place locally at the client side. 

Before having a closer look at these components, we first depart to some 
definitions underlying the concepts of MDDQL as a visual query language. We 
strongly rely on the formal specification of a visual query system given by the 
triple M, V, C, where M refers to internal representation of MDDQL terms 
(2.1), V to the views called query nodes (2.2), and C to the visual presentation 
of query nodes (2.3). 

2.1. THE META-DATA DATABASE 

An MDDQL term is conceived as a unit of thought and is represented as an 
object which is assigned a unique identifier. A term might be associated with 
one or more words. Furthermore, a term is associated with implementation 
symbols which stand for storage encodings. Therefore, we specify a query 
semiotics triangle built up from terms, words and implementation sym- 
bols. An implementation symbol is any kind of symbol used for the internal 
storage of values and/or attributes, whereas a word is a natural language ele- 
ment. 

For example, the terms OM600, OM601, OM602, OM603, OM604, OM605 
as part of the domain of interest are assigned the words radiation, temperature. 
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Figure 1 MDDQL components 



exposition^ northern slopes, eastern slopes, no avalanche, respectively. They 
might also be assigned the words Strahlung, Temperatur, Lage, Nordhang, Os- 
thang, keine Lawine, respectively. On the other side, the terms are assigned the 
implementation symbols RW, TA, LA, 2, 1, 0. Note that the three last terms for- 
m a well-restricted value domain for the property exposition( English) or Lage 
(German). Additionally, the arithmetic intervals [0, 120] and [—50, 50] are also 
represented as terms which stand for well-restricted value domains referring to 
radiation (OM600), temperature (OM601), respectively. 

Thus it is possible to assign words in different languages to the same term and 
still keep on referring to the same implementation symbol. This enables query- 
ing by using elements of different natural languages without any side effects 
upon the query results. The set of all MDDQL terms will be called alphabet and 
is mainly divided into two subsets: domain of interest and operations to apply 
on it. All terms (objects) are managed and provided by a meta-data database the 
underlying connectionism model of which is a cyclic directed graph connecting 
terms, which belong to the subset of domain of interest terms. 

Since terms are conceived as objects, additional attributes can provide more 
characteristic properties for particular terms. For instance, a description at- 
tribute can provide a more descriptive piece of information referring to the 
notion of a term, such as reflection intensity of solar radiation for radiation 
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( OM600), or the measurement unit in which an arithmetic interval is expressed 
as a well-restricted value domain for an attribute, such as watt per square meter 
for radiation ( OM600). 

Moreover, terms (objects) are assigned a role as members of collections 
or classes. We mainly distinguish among the roles of concepts, relationships, 
properties (categorical, numerical), concrete value domains, all of which might 
be atomic or complex. For instance, the terms OM600, OM601 are classified 
as properties (numerical) where the term OM602 is classified as properties 
(categorical), since the assigned well-restricted value domain is an arithmetic 
interval. 

Analogously, operations such as OM700, OM701, OM702, OM703, OM704 
are assigned the natural words minimum, maximum, absolute frequency, distri- 
bution, mean value and might also be assigned to implementation symbols such 
as min, max, AF, Di, MV. Furthermore, they also underly a classification schema 
such as one-dimensional, two-dimensional, categorical, numerical, etc.. 

The connectionism model provides three kinds of mappings: a) those holding 
for terms which are members of the domain of interest but have distinguished 
classifications - interconnecting links, b) those holding for terms which are 
members of the domain of interest but belong to the same class - recursive 
links, c) between collections of domain of interest terms and operations terms. 
Recursive N : M mappings over the same class of terms enables the formation 
of more complex elements such as assignment of concrete values to concrete 
value domains, the construction of complex attributes and/or concepts such 
as coordinates being a property defined over properties: height, longitude, 
latitude, ox Automatically measured data being a concept defined over concepts: 
ENET data, IMIS data. 

Therefore, construction of a query becomes a matter of navigation through 
a so-called query information space QIS as formed by all terms of MD- 
DQL. QIS is conceived as an cyclic directed graph Gq which consists of (V, E) 
defined as F = {vn\'^ ^ Ef}, where V is the set of vertices standing for terms, 
and E = {{vn-i^'^n)\'^n-i is a parent of Vji}, where E is the set of directed 
edges standing for the mappings among terms. V[ would be the set of entry 
vertices without any incoming edges. 

2 . 2 . MDDQL INTERPRETER 

Since query construction takes place without any knowledge of language 
syntax, the user is guided by the system in order to pose her/his intended query. 
This is achieved by moving through consistent query states when the current 
query state is given. Thus the inference of which subsequent query states are 
consistent with the current one is provided by the MDDQL-Interpreter on the 
basis of a formally specified state automaton. 
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Definition:. A query state Sq is a set of nodes such that Sq C V. An 
initial query state is a set of initial nodes 5,0 C F/ 

Definition:. For a given query state 5,„, the set of candidate terms 
Cn is defined as Cn = {vj | for some vi G 5^^, '^j) ^ vj ^ Sq^} 

Definition:. A precondition is a formula py^ G Py in conjunctive 
(AND-connected) or disjunctive (OR-connected) normal form connecting ver- 
tices (terms) Vi G V, where Py is the set of all preconditions associated with 
particular vertices Vm ^ V. Let us call the set of vertices appearing 'mpy^, 
it holds that py^ true, if 3 Vi G Py^^ such that V{ G Sq^ for disjunctive 
preconditions, oxV V{ G p'y^^vi G Sq^ for conjunctive preconditions. 

Definition:. For a given query state Sq^, a set of candidate terms Cn is 
consistent, if for each Vj G either there is no precondition p associated with 
Vj or p evaluates to false for the given query state 5,„. 

Definition:. For a given query state 5,„, a potential query state 
5,„+x is Sq^ U Cn, where Cn is a consistent set of candidate terms for Sq^. 

However, the set of consistent candidate terms is being inferred each time 
a particular term is activated by the end user. This means that the end user 
is interested in extending and or refining her/his query. The communication 
between Interpreter and Visual Query Interface takes place in terms of inter- 
changing objects which are representing query nodes. These are constructed 
and provided by the interpreter to the visual query interface. 

The query nodes correspond to the views of terms. They include a subset of 
the available information concerning terms. Currently, it is restricted to slots 
such as 

OID, label (word), description, connector, role, operation, link label, link 
type 

where label refers to the natural language word, description refers to 
term annotations, role to the role of term due to the classification model men- 
tioned in 2.1, connector to the logical connector (AND, OR, NOT), link 
label to the kind of link (IS-A, part-of, etc.), link type to the notion of link 
such as assertional or definitional. Depending on the case, some of these slots 
have to be initialised by the selections made by the end user. For instance, name 
when an arithmetic interval or value has to be set, or operation when a particular 
operation is assigned out of a set of semantically consistent operations (see also 
section 3.2). 
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2.3. THE VISUAL QUERY INTERFACE 

The query construction takes place on a blackboard and relies upon a graph- 
based notation. Each query node is represented by a rectangle in different 
colours or an elliptical circle denoting the role of the term (query node) within 
the query such as concept, relationship, property, or specific value. Figure 2 
gives an example of a query expressed by MDDQL elements. Representation 
of the query nodes is based upon the contents of the query nodes as objects 
delivered by the interpreter. 




Figure 2 An example query expressed in MDDQL terms 



Nodes are labelled by assigned words of a specific natural language - in 
our example English - and are linked with other nodes constituting a query 
graph. Links can be labelled optionally indicating the notion of the link by 
using, in turn, specific natural language words such as is-a/as, composed 
of, characterised by, etc. Additionally, annotations of terms can be viewed 
by pointing the cursor on a particular term. On the other side, operations are 
assigned to nodes by using specific icons (more details are given in 3.2). 

Since operations are also subject of semantic inferences, they must be select- 
ed out of a set of suggested operations which are semantically consistent with 
classes of nodes or terms, they are all presented by their language language 
counterpart. Therefore, the operation symbols are only indicative of which 
terms have been assigned operations. Logical connectors (AND, OR, NOT) 
are also foreseen and are separated from the other operators. Nodes standing 
for values can be negated by turning them to dark shaded rectangles. 

3. INTERACTION STRATEGY 

For each query construction session, the user addresses the corresponding 
URL for a particular application domain. An initialised Applet is down loaded 
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which includes the Java packages for the visual formalism as implemented with 
the Swing library, Additionally, the Java package for the MDDQL Interpreter 
and the OMS Java files for the meta-data schema providing the representational 
issues of MDDQL-terms together with the instances (terms) themselves, which 
correspond to the relevant application domain are also down loaded to the 
client site. Therefore, the first two front-end layers (figure 1) together with a 
copy of the meta-data database for MDDQL-terms are available in the main 
memory of the client’s machine. In the following scenario describing the user 
interaction strategy for query construction, we distinguish between construction 
issues referring to terms of the Domain of Interest (3.1) and those referring to 
Operations (3.2). 

3.1. VIEWING THE DOMAIN OF INTEREST 

At the beginning of query construction session, the user is requested to choose 
a preferred natural language in which the annotated nodes and links, as well 
as additional elements, are going to appear. For the sake of convenience, we 
will restrict ourselves to English as a presentation language in the following 
examples, and to paradigms which refer to the application domain of RAIFoS. 
We will also refer to terms by using the corresponding natural word and not the 
object identifier, when possible, since internal representation issues of terms 
are hidden from the end user. 
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Figure 3 A proposed set of MDDQL terms to start with 



Assuming that the end user either has no idea or does not need to gain 
some knowledge about the underlying database issues, she/he is required to 
choose an initial term out of the set of initial terms Vj as provided by the 
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system (MDDQL-Interpreter). This puts the emphasis on the main subject, a 
concept with which we would like to start constructing a query. For example, an 
initial set of terms could have been {Automatically delivered measurement data, 
Observers data, measurement network, measurement station} (figure 3). If 
the term Automatically delivered measurement data would have been 
chosen, a query node is constructed out of the terms, as explained in section 
2.1, and positioned on the blackboard. 

In order to move further towards query refinements and/or restrictions, the us- 
er has to click on a query node. Consequently, the system (MDDQL-Interpreter) 
comes up with a set of potential query nodes (terms which are presented within 
a popped-up selection window). One or more of the suggested terms might be 
selected and, therefore, linked with the clicked node either (a) as being a special 
case of a previous term (query node) - recursive links, e.g., ENET 10-minutes 
data, ENET 1-hour data, IMlS-100 data might be connected to automatically 
delivered measurement data, or (b) as assigned characteristics through intercon- 
necting links such as properties, for instance time, characterising a particular 
concept or relationships, for instance measured by, leading to other concepts 
(see also figure 4). 

Let us assume that the user wishes to further refine her/his query in that 
IMIS-100 data further refines the current query state {automatically delivered 
measurement data} which now becomes {automatically delivered measurement 
data, IMIS-100 data} (checked box in figure 4). A further refinement will 
be caused by clicking on the query node IMIS-100 data which results in a 
suggestion consisting of the set of terms {Wind data. Atmospheric data, Snow 
data}, (see also figure 5), since all these kinds of terms are conceived as part of 
ENET 10-minutes data, ENET 1-hour data or IMIS- 100 data. 

Selecting Atmospheric data as a term to extend the current query state which 
now becomes { automatically delivered measurement data, IMIS- 100 data. 
Atmospheric data}, a further query refinement could be done by clicking on 
the query node Atmospheric data. The set of consistent potential terms would 
be measured by , Time, Radiation, Humidity, Temperature (see al- 
so figure 5). Note that the terms Time, measured by are suggested at both query 
extension stages (figures 4 and 5), since they are relevant to both activated query 
nodes IMIS-100 data and Atmospheric data. This corresponds to the in- 
ternal representational model for MDDQL terms (see also section 2.1) which 
enables multiple inheritance. 

Having selected time or measured by at a particular stage, these terms will 
not be further suggested within a particular inheritance hierarchy, indicating the 
fact that they are all relevant regardless the refinement degree of a particular 
concept. Further semantic constraints also apply to the suggestion of properties 
when they are inferred by the MDDQL-Interpreter as a consistent set of potential 
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Figure 4 An extension of current query state by adding the term IMIS- 100 data 




Figure 5 A potential extension of current query state by activating the node Atmospheric 
data 



nodes (terms) (see also section 2.2) to be further considered for the extension 
of the current query state or graph. 

For instance, the properties Radiation, Humidity would not have been 
suggested, if one of ENET 10-minutes data, ENET 1 -hour data had been con- 
sidered within the current query state, since it makes no sense to address these 
properties in conjunction with these concepts. Even in case of re-activation 
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of the query node automatically delivered measurement data, the term IMIS- 
100 data will not appear within the set of potential terms, since it is already 
considered within the current query state. 

Selected terms extend the current state of the query graph in that all select- 
ed terms are positioned on the blackboard by drawing the appropriate links 
among them. This user-system interaction principle can be applied to all n- 
odes (terms) belonging to the current state of the query graph. For instance, 
selecting the terms temperature, humidity, radiation will extend the 
current query state to { automatically delivered measurement data, IMIS-100 
data, Atmospheric data. Temperature, Humidity, Radiation}. Clicking on node 
temperature will provide the term [—50, 50] Celsius as a well-restricted value 
domain within which conditional values have to be specified (see also figure 6). 
Any values specified outside this particular range of values will be rejected. 
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Figure 6 A well-restricted value domain for arithmetic values 



Similar restrictions also apply to almost all kinds of well-restricted or con- 
crete value domains. For example, the property direction of wind is as- 
signed two possible well-restricted value domains which should be considered 
for the intended query, such as [0, 360] degrees or the set of categorical values 
[North, North-East, North-West, South,...} within another query context. The 
arithmetic interval will be suggested by the MDDQL-Interpreter only if the cur- 
rent query state is something like [Automatically delivered measurement data, 
ENET 10-minutes data. Wind data. Direction}. The set of categorical values 
will be presented, as depicted in figure 7, only when the current query context 
would have been [Observers data. Wind data. Direction}. 
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Figure 7 A well-restricted domain with categorical values 



A further interaction issue is the possibility of activating the labelling of 
links which gives an insight into the connectionism semantics of the query 
nodes (terms). For example, the query state {Automatically delivered mea- 
surement data, IMIS-100 data, Humidity, Radiation, Temperature, [—20, 10]} - 
assuming the user specified [—20, 10] as value constraint for temperature - can 
be turned into [Automatically delivered measurement data, as, IMIS-100 data, 
as Atmospheric data. Characterised by, Humidity, Characterised by. Radia- 
tion, Characterised by. Temperature, Constrained by, [—20, lQ]celsius}. It 
is also possible to provide a more detailed description of each node (term) by 
pointing the cursor on the relevant term either on the blackboard or within the 
popped-up window with the suggested terms, just before a particular term is 
selected. 

Thus it could be possible to gain more information about a particular term 
before considering it further for the intended query. For instance, the term 
Radiation is annotated by The reflected radiation from snow, providing an 
additional explanation of the notion of this term. Similarly, measurement units 
are also provided as annotations for arithmetic intervals, in order to illustrate 
the fact that the arithmetic values are expressed in a specific measurement unit 
(figure 6). 

Since further restrictions can be included in terms of relationships associating 
concepts, as suggested by the MDDQL-Interpreter at a current query state, the 
natural language based interpretation of the intended query, as depicted in figure 
2, would be: 
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Get the humidity and radiation of the atmosphere as part of IMIS- 100 data, 
as part of automatically delivered measurement data, where the temperature is 
within [—20, 10] Celsius and are measured by measurement stations located at 
Swiss Alps region and for the time between 1-10-96 and 30-5-97, 

3.2. ASSIGNING OPERATIONS TO QUERY NODES 

Having constructed an intended query in terms of nodes which refer to MD- 
DQL terms as members of the set of Domain of Interest, the user might assign 
operations to particular nodes according to their roles within the query. This 
can be done by clicking on that node with the right mouse button in order to 
receive a semantically coherent set of operations which can be assigned to the 
node. The potential set of operations is also inferred by the MDDQL Interpreter 
and appear within a popped-up window similar to the suggestion of terms from 
the domain of interest. 

These operational inferences take into account the actual role of an MDDQL 
term as given by the classification hierarchies underlying the meta-data database 
model as well as the current query state. Assuming that the node is assigned 
the role categorical variable indicating the fact that the values correspond to 
categories such as North, North-West, North-East,... even if they are internally 
encoded by numerical values such as 0 , 1, 2 ,..., no arithmetic operations such 
as average or mean value, deviation will be suggested by the system 
for consideration. Similarly, comparison operators such as >==, <=, <> are 
only suggested for nodes which stand for atomic arithmetic values. 

On the other side, two- or three-dimensional operations such as time series, 
scattering diagrams, histograms are only considered in conjunction with the 
presence of more than one property within the current query state. Furthermore, 
preconditions might be assigned to the objects representing operations within 
the meta-data database. Since operations are objects too, additional attributes 
might be defined which provide more information about the intended usage of 
each operation. This can be viewed at the time of selecting a particular operation 
out of the set of potential operations by pointing the cursor on a candidate term 
(operation) (see also figure 8). 

For example, consider the intended query as depicted in figure 8. Since the 
property Radiation in classified as a numerical variable, a set of statistical 
operations is suggested for selection. Having chosen the average operation, 
the corresponding node standing for Radiation on the blackboard will be 
annotated by a graphical symbol of this particular operation. Accordingly, the 
natural language based interpretation of the intended query would be: 

Get the average of atmospheric radiation as part of IMIS-100 data, as part 
of automatically delivered measurement data, where the humidity is greater 
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Figure 8 Assigning operations to query nodes 



than 50 % and temperature less than or equal to zero (0) Celsius, for the time 
between 1-10-96 and 30-5-97. 

It is also possible to assign logical operators AND or OR to nodes (terms) 
classified as concepts or properties as well as negating a term by assigning it the 
NOT operator. The latter is currently restricted to nodes standing for specified 
and/or selected atomic or interval values. Consequently, these values should be 
excluded from the query result. 

Finally, the user might submit the constructed query by choosing the execute 
query option from the query menu of the blackboard. The system will convert 
the graph based query into a conventional query such as SQL or OQL where 
internal mappings between the MDDQL terms and implementation symbols 
are taken into account. In this paper, we will not cover the mapping issues 
between MDDQL and other database specific query languages as well as the 
representational issues of query results, since they might be presented by various 
types such as alpha-numerical and/or graphics, e.g. plottings. We rather focus 
on the visual query formalism as well as on the query construction mechanism. 

4. FORMAL ASPECTS OF MDDQL 

Most visual query systems or languages, in general, have no underlying 
formal syntax definitions such as BNF notation for textual query languages, 
since the bi-dimensionality of such languages exceeds the capabilities of string- 
based grammars. The semantics is often operational and expressed by rewriting 
rules which transform the proposed language into some other well-defined target 
language. 

Since formulation of a query becomes a matter of system guidance on the 
basis of semantic constraints rather than direct usage of diagrammatic represen- 
tation of conceptual schemas or other database model abstractions, MDDQL 
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falls into the category of generative languages as known to the AI and com- 
putational linguistics communities. It this category of languages, the grammar 
rules are used for the generation of phrases rather than for parsing of already 
formulated phrases. It is out of the scope of this paper to provide a thorough 
comparison with widely known generative grammars such as attribute gram- 
mars, augmented transition networks, etc. 

However, we feel the need to have a formal specification of MDDQL in terms 
of generative grammar rules in order to gain an insight into the expressiveness 
of the language as a natural sublanguage, since it is restricted to queries. These 
rules are a form of production rules, where the symbol Q on the left side must 
be interpreted as generate query. This determines what to say in addition to the 
two other stages of how to relate it to the listener or viewer - covered by the 
visual formalism - and how to map it into a string of words - covered by the 
chosen vocabulary as provided by the assigned words to terms. All three stages 
are involved in the genesis of a language. 

An overview of these production rules referring to the sequence of terms as 
members of the Domain of Interest is given in the following: 

Q-^GQ\CQ 

GQ ^ GONGEPT PROPERTY 
GQ ^ GQ VALUE - DOMAIN 
GONGEPT e {GONGEPT \ r GONGEPT | e} 

PROPERTY p [PROPERTY | e} 

VALUE - DOMAIN ^ cvd [VALUE - DOMAIN | e} 

Due to these production rules, a generated query Q might take the form of 
a general query GQ or that of a conditioned query RQ. A general query 
constitutes concepts followed by properties, whereas a conditioned query is 
an extension of a general query in that specific conditions are considered over 
well-restricted value domains. 

Assuming that i2, P, GVD are subsets of the Domain of Interest which 
is, in turn, a subset of the MDDQL alphabet, the recursive definition of the 
last three production rules indicate the fact that: (a) a starting term, as standing 
for a concept, is an entity e e E followed by the empty set e or by a pair of 
terms standing for a relationship r e R and entity e e E, respectively, or by an 
entity e e E, (b) a certain term, as standing for a property, is a property p e P 
followed by an empty set or a term standing, in turn, for a property. Finally, a 
term cvd G GVD follows a property in a conditioned query where cvd might 
be followed, in turn, by an empty set or cvd G GVD. 

These recursive definitions enable the consideration of complex concepts, 
properties and/or well-restricted value domains within a query, e, r,p, cvd are 
not terminal symbols of the grammar but rather variables to be instantiated by 
the user. A possible alternative would be to assign conditions for applying a 
rule and/or associate attributes with the non-terminal symbols. This technique 
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led to the definition of attribute grammars (Knuth, 1968), where the condition- 
s operate on attribute-value records. However, since the conditions strongly 
depend on instances (particular terms) and/or the roles of terms, it would be 
cumbersome or even impossible to express these conditions within the context 
of production rules. 

5. CONCLUSION 

We presented a meta-data driven query language as a visual query 
language. End users do not need to make use of a language syntax and/or fully 
understand the database model, in terms of conceptual schema, attribute and 
data value interpretations, in order to pose queries to databases, since these 
interpretations are explicitly embedded within and provided by the MDDQL 
terms. This also enables the usage of any natural language when queries are 
constructed. 

Additionally, in order to reduce the effort of query construction in terms of 
large and, somehow, difficult to cope with conceptual schemas, the construction 
of the intended queries is done by system guidance relying on inferences based 
on a) background knowledge as expressed by structural and connectionism 
issues of the MDDQL terms, b) the current query state. 
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Abstract Database management systems for multimedia data retrieval are becom- 
ing more important, as digital videos and cameras increase in popularity. 
An important feature of multimedia data retrieval is that users rarely 
specify their first queries exactly, and must clarify what they want by 
browsing the query results, refining their query by trial and error. It 
is therefore desirable for a multimedia database management system 
(DBMS) to develop a rough query result quickly and refine it over time. 
This paper describes a hierarchical space model for the multimedia data 
retrieval that is similar to that of the human memory hierarchy. The aim 
of the hierarchical space model is to improve the similarity retrieval’s 
performance with little loss in query result quality. We implemented 
the hierarchical space model on the ORDBMS LiteObject and applied 
it to an image retrieval application. The results of this test proved the 
efficiency of our hierarchical space model. 



1. INTRODUCTION 

Database management systems for multimedia retrieval are becoming 
more important because digital cameras and videos are increasing in pop- 
ularity and because huge amounts of music and movie data (combined 
in term multimedia data) are now being stored in electrical format. An 
important feature of multimedia data retrieval (Gupta and Jain, 1997) 
is that users rarely have a clear idea of what they wish to retrieve and 
must clarify their queries by browsing the query results and refining the 
query condition by trial and error. 

For example, suppose that you want to search for clip-art images 
containing a man operating a computer. You will get many clip-art 
images in the query result by using a multimedia DBMS. However, were 
you to review all of these images, you would probably find that only a 
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few of them would fit what you had in mind. There are several possible 
reasons for this result. 

1 You don’t have a clear idea of what you want and the above con- 
dition “image containing a man who operates a computer” is not 
exact enough. For example, what you really want may be an im- 
age containing a man with mobile computer or a man at work in a 
computer room. However, you don’t realize this until you browsed 
the images. 

2 Even if you have a clear idea of what you want, it is difficult to 
construct a search condition embodying it. For example, suppose 
you want an image containing a man with a mobile computer run- 
ning down Wall Street in New York on a rainy day. It would be 
difficult to translate such a query to a proper search condition that 
fits the search engine’s format, or to construct a search condition 
with the proper key images. 

3 Even if you have a clear idea of what your search condition should 
be and can construct a search condition embodying your idea, you 
probably won’t get a correct result because the multimedia DBMS 
technology has not matured enough. 

This paper focuses on the first type of data retrieval problem. One 
way to solve it is to have the multimedia DBMS develop a query result 
quickly with little loss of a query result quality, have the user browse 
the result, and submit a refined query (we refer the first query as rough 
retrieval and refer the conventional method as precise retrieval). There 
are three approaches achieving this rough retrieval. 

1 A hierarchical space stores instances in a hierarchy that is a metaphor 
of the human memory hierarchy. Basically, this approach places 
important instances in upper layers and less important instances 
in lower layers. Thus, rough retrieval can be achieved by limiting 
the target instances that are managed only in the upper layers. 

2 A hierarchical data structure stores an instance at several different 
resolution levels. FlashPix format and thumbnail data are exam- 
ples of different resolution levels. Thus, rough retrieval can be done 
by executing a query using the rough resolution level data. An- 
other way to do rough retrieval is by storing feature vector data ex- 
tracted from each instance at several different levels. Thus, rough 
retrieval can be done by executing a query that uses rough feature 
vector data. 
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3 Rough retrieval can be done by tuning the parameters of a a mul- 
tidimensional index algorithm like R-tree (Guttman, 1984). 

This paper describes a hierarchical space model based on the hier- 
archical space approach. The main feature of the space model is the 
introduction of a layer concept to divide the instances into several col- 
lections of instances for each class, whereas conventional models manage 
all instances in a single space. The search space of the hierarchical space 
model is reduced by specifying the upper layer and keeps a query result 
quality by using appropriate division rules. 

2. HIERARCHICAL SPACE MODEL 
2.1. DATA MODEL 

Class. Class is a template of an instance [class instantiates an 
instance) and is composed of name, ID, the maximum depth of the 
hierarchy (depth), a collection of attributes (attributes), and a collection 
of layers (layers). 

Each attribute is composed of name and type and method is composed 
of name and return type. 

class = { name string^ ID integer^ depth integer^ attributes Set [attribute)^ 
layers Set[layer)) 

attribute — { name string^ type type) 

where Set[v) is a value collection of type v. 



Layer. Layer manages a collection of instances (instances), the 

depth of itself in the hierarchy (number), the reference to the class that 
the instances are instantiated from, and layer type. 

layer — (number integer^ instances Set [in stance)^ class class^ fype Itype) 

There are two types of layers: 1) real layers for persistent instances, 
and 2) virtual layers that are created by performing layer operations on 
existing layers. A number is given to each layer that indicates its depth 
in the hierarchy. 

There are two types of useful hierarchies: 1) recall/forget hier- 
archy represents the level of importance of an instance. The less im- 
portant an instance is, the lower it goes in the hierarchy. Figure 1 
shows an example of this type of hierarchy (storing database research 
class instances) in which multimedia is the most important instances 
and OLTP and RDB are the least important instances in the database. 

2) abstract /concrete hierarchy represents the level of abstraction of 
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Figure 1 Recall /forget level hierarchy 



Color Class 




Figure 2 Abstract /concrete level hierarchy 

an instance. The more abstract an instance is, the higher it is placed 
in the hierarchy. Figure 2 shows an example of this type of hierarchy 
(storing color class data) in which red and green are in the highest layer 
of abstraction and pink and dark red are in lower layers of abstraction. 
The parent/children association between instances is a self- association 
in this case. 

The association and association instance are described below to sup- 
port the abstract/concrete hierarchy. 

Association. Association is a directional reference between two 
classes and is a template of an association instance [association in- 
stantiates association instance). It is also composed of name, ID, the 
class reference of its source class (from), the class reference of its desti- 
nation class (to), and the association type. The association type can be 
one to one, one to many, many to one, or many to many. The operation 
called traverse (represented as trav) navigates from a source instance to 
destination instances through the association. 

association = ( name string^ ID integer.^ from class^ 
to class^ astype astype) 
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trav : association X instance Set{instance) 

2.2. OPERATIONS 

Union. The union operation generates a virtual layer from two 
layers. The generated layer manages the instances that form the union 
of the layers’ instances. 

layer [n^insesn) union layer[m^insesm) 

— layer[l^insesn union insesm) 

Difference. The difference (represented as — ) operation gener- 
ates a virtual layer from two layers. The generated layer manages the 
instances of the first layer that are left over after the instances that are 
common to both layers are removed. 

layer{n^ insesn) - layer inseSm) = layer(l^ inses^ — inseSm) 

Selection. The selection (represented as a) operation generates 
a virtual layer from another layer. The generated layer manages the 
instances that satisfy a specified predicate. 

ap(layer{n^insesn)) = layer[l^insesi) 

where insesi C insesn and "iins G insesi.p{ins) = true. 



Redistribute. The redistribute (represented as redist) operation 
redistributes instances from one specified layer to another specified real 
layer. 

redi5t(targetLayer layer., toLayer layer) 

This operation deletes all instances of the targetLayer in all real layers 
except the toLayer and moves all instances of the targetLayer to the 
toLayer. 

3. IMPLEMENTATION 

We implemented the hierarchical space model on the ORDBMS LiteOb- 
ject. This section describes the details of the implementation including 
the redistribution process in two types of hierarchies and gives results of 
experiment conducted on the implementation. 

3.1. REDISTRIBUTION IN HIERARCHIES 

Recall/forget hierarchy. We implemented this hierarchy by stor- 
ing the number of times each instance is accessed and by using an em- 
bedded function getAccessCountQ to get this number. For example, in 
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Figure 1, suppose that the DB research class has two layers and that you 
want to redistribute instances using the rules given below. If the num- 
ber of times an instance is accessed is larger than a specified threshold 
(ten in the example), the instance is moved to layer 0, and rest of the 
instances are moved to layer 1. This function is represented by the SQL 
statements below. 

redist class DB Research layer 0,1 where getAccessCount() < 10 to 

layer 0; redist class DB Research layer 0,1 where getAccessCount() 

) 10 to layer 1; 

Abstract /concrete hierarchy. An example of this type of hier- 
archy is shown in Figure 2. Suppose that the Color class in the figure 
has several layers and that you want to redistribute instances according 
to their depth along the path of the a parent /children association. In 
this case, instances that don’t have parents are moved to layer 0, and 
instances that are traversable via the parent /children association from 
instances in layer 0 are moved to layer 1, and so forth. Those instances 
whose path lengths exceed the maximum depth of the Color class hier- 
archy are moved to the bottom layer. This function is represented by 
the SQL statement below. 

redist class Color via parent/ children; 

3.2. IMAGE RETRIEVAL APPLICATION 

Retrieval process. First, we construct an initial query that has a 
rough condition because, as explained in section 1, we still don’t have 
a clear idea of what we want at this point. Second, we do a rough 
retrieval using the initial query condition and get the result. Third, we 
refine the query condition by browsing the result and then execute the 
refined query. We repeat this step until we get the desired result. 

Rough retrieval is useful until the query is clarified, because it returns 
an answer faster than conventional precise retrieval. 

Image retrieval application. Figure 3 shows the image retrieval 
process of the ExSight, the left side of the figure shows the GUI of the 
ExSight viewer, and the right side shows the LiteObject with an access 
method for still images. The retrieval process is as follows. 



1 The user inputs key images into the search condition area of the 
ExSight viewer and specifies the target layers that represent the 
rough level of the query. For example, the search condition area in 
Figure 1 shows a situation in which the user is searching for images 
containing both a boy’s face and a blue shirt and has specified 
layers 0 and 1 (of a three layer hierarchy). 
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Figure 3 ExSight image retrieval process 

2 The ExSight viewer submits a query statement to the LiteObject. 
To speed up processing, sub-image instances and their feature vec- 
tor are extracted from each image instance, and multidimensional 
indices are constructed from the feature vectors (hue, saturation, 
intensity, outer-shape, size, position) (Curtis et ah, 1997). 

3 The Liteobject processes the query statement using the specified 
layers and the multidimensional indices and returns the query re- 
sult. 

4 The ExSight viewer displays the query result. For example, the 
query result area in Figure 1 shows the top two image instances 
and highlights the border of the sub-image instances that satisfy 
the query condition. 



3.3. EXPERIMENTS 

We compared the performance of the hierarchical space model with 
that of the conventional model. 

Environment. Sun UltraGO (Ultra SPARC-II 360 MHz * 2), O.S. 
Solaris2.6, Memory 2 GB. 

Data. We used 996 PhotoDisc(R) images (scenery, food, people 
playing sports and so forth) and 96061 sub-images that were extracted 
from these images. 

Queries. We chose 18 queries that returned good results in the last 
step of the retrieval process in section 3.2. The queries were categorized 
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Table 1 Experimental Result 



model 


image 


sub-image 


response time(sec) 


conventional 


996 


96061 


37.3 


hierarchical 
1 


{619, 377} 


(1651, 94410} 


2.2 



according to two patterns: 1) images searched using a condition with 
image keys (three queries) and 2) images and sub-images searched using 
a condition with sub-image keys (fifteen queries). 

Evaluation method. We evaluated the two models as follows. 

Conventional: We measured the total response time of all 18 SQL 
queries after building multidimensional indices. 

Hierarchical: We distributed instances according to the number of ac- 
cesses each of the 18 SQL queries had for each instance, and then 
measured the total response time of all 18 SQL queries after build- 
ing multidimensional indices and specifying the upper layer. 

Experimental result. Table 1 shows the the result of the redis- 
tribution. The image hierarchy comprised 377 instances in the upper 
layer and 619 instances in the lower layer, and the sub-image hierarchy 
comprised 1651 instances in the upper layer and 94410 instances in the 
lower layer. 

Our hierarchical model was about 17 times faster than the conven- 
tional model because the search space of the hierarchical model was 
reduced by specifying the upper layer. In addition, there was little dif- 
ference between the query results of the two models, which demonstrates 
that the hierarchical model didn’t degrade the query result quality. ^ 

4. DISCUSSION 

Although the implemented division rule is rather simple (based on 
the number of times each instance is accessed), the experimental re- 
sult demonstrated the efficiency of our hierarchical model for the image 
retrieval application ExSight. There are two reasons for this result. 



^What difference there was was caused by changes in the target instances, which altered the 
multidimensional indices’ structure. 
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■ There are instances that are useless or that have never been searched 
and those instances are redistributed into the lower layers of the 
hierarchy. For example, some of the sub-images that the ExSight 
extracts are useless (e.g., a very small extracted sub-image is al- 
most always useless). 



■ The number of times an instance appears in the query results (in- 
stance frequency) varies from instance to instance. For example, 
if users tend to search newer data more often than older data, it is 
possible to improve search performance by redistributing the newer 
data to the upper layers and the older data to the lower layers. 



On the other hand, our model does not improve performance when the 
instance frequency does not vary much from instance to instance because 
there is no effective layer division in this case. 

There are several problems that remain to be solved. 

1. Humans tend to remember what is very important or impressive 
to them. The implementation of the recall/forget hierarchy does not 
rank instances according to such a subjective measure; it only ranks an 
instance according to the number of times it was accessed or by the last 
time the it was accessed. Relevance feedback or other techniques will 
solve this problem. 

2 . The implementation of the abstract/concrete hierarchy redis- 
tributes instances according to their depth along the path of the a spe- 
cific association. It is useful but it is not easy to move each instance to 
the desired layer. In this case, we have to add an extra attribute to each 
instance and give a specific redistribution rule even though it is time 
consuming work. 

3. The hierarchical space model can not be adjusted to fit any user 
profile or context because it handles only one hierarchy at a time. Ex- 
tending the hierarchical model so that it can support several hierarchies 
at once will enable effective user-dependent or context-dependent rough 
retrieval. 

4. It is not easy for users to specify which layers in an SQL statement 
would be best for conducting the rough retrieval. We should build a 
translator that takes as input the desired response time or the preciseness 
of the retrieval and calculate the optimum number of layers for the query. 
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5. CONCLUSION 

This paper described a hierarchical space model for multimedia data 
retrieval that is suitable for users who generally clarify their queries by 
browsing the query results and refining the query conditions. The main 
feature of the model is the introduction of a layer concept to divide the 
instances into collections of instances for each class in a manner similar 
to that of the human memory hierarchy. We defined the data model 
and operations based on this layer concept and described two typical 
hierarchies: recall/forget hierarchy and abstract/concrete hierarchy. We 
implemented the hierarchical model on the ORDBMS LiteObject using 
the extended query language and applied it to ExSight for PhotoDisc(R) 
image data. Our space model improved the similarity retrieval’s perfor- 
mance with little loss in retrieval quality. 
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Abstract Clustering is one of the most important topics in the field of knowledge 
discovery from databases. Specifically, hierarchical clustering is useful 
because it can be used to interactively guide users in browsing a huge 
database. In many cases, database clustering can be modeled as a 
graph partitioning problem, because a database with a distance function 
defined on it can be regarded as an edge weighted graph. So process 
of MST(Minimal Spanning Tree) construction is a possible solution to 
this problem. 

In this paper, we propose an efficient MST construction method for a 
database with an arbitrary distance function on it. Our method utilizes 
a metric index to reduce the number of distance calculations needed to 
construct an MST. For this purpose, we introduce a new metric index 
named metric matrix. Experimental results show that our method 
can reduce the number of distance calculations needed in comparison 
with the classical method. 

Keywords: Clustering, MST, Metric Index, Data Mining Knowledge Discovery from 
Databases 
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1. INTRODUCTION 

Clustering is one of the most important topics in the field of knowl- 
edge discovery from databases. Clustering algorithms can be classified 
into two types, partitional and hierarchical ones. Especially, hierarchi- 
cal clustering is useful because it can be used to guide users in exploring 
huge databases. Since distance functions are often defined on databases 
to measure dissimilarities between data objects, we can derive an edge- 
weighted complete graph from a database by mapping objects and dis- 
tances to vertices and weights of edges, respectively. Then clustering of 
a database can be modeled as a partitioning problem of a graph derived 
from the database and a distance function defined on it. 

MST (Minimal Spanning Tree) construction is a possible solution to 
the graph partitioning problem. Figure 1 illustrates why MST construc- 
tion can be a solution to it. In the figure, an MST of a graph and its 
representation in a form of tree called dendrogram are depicted. Notice 
that a dendrogram can be viewed as a hierarchical clustering structure of 
the graph. Although well known classical algorithms for MST construc- 




Figure 1 MST of a Graph and its Dendrogram 



tion, such as Prim’s and Kruskal’s algorithms, are available, they are 
not efficient for many recent applications, such as image databases and 
video databases. Since a distance calculation tends to be expensive in 
such applications and graphs derived from databases are complete, the 
number of distance calculations needed for MST construction amounts 
to nearly where |V^| is the number of vertices of the graph. 

In order to reduce the number of expensive distance calculations, we 
propose an efficient MST construction method, which utilizes a met- 
ric index. This method can be applied to such a graph whose weights 
of edges satisfy the metric postulate. In our method, near neighbour 
searches centered at the same vertices are repeatedly performed by incre- 
mentally changing the radius using a metric index. Therefore, straight- 
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forward application of existing metric indices, such as MVP-tree (Tolga 
et ah, 1997) and Ml-tree (Ishikawa et ah, 1999), to our method necessi- 
tates redundant distance calculations. 

In order to overcome this difficulty, we introduce a new metric in- 
dex for MST construction named metric matrix^ which supports radius- 
incremental differential near neighbour searches and eliminates redun- 
dant distance calculations in the course of MST construction. 

2. APPROACH 

2.1. MST PROBLEM 

Let G = (F, (f)) be a non-directed, connected, edge weighted graph, 

where P is a set of vertices, E{C V xV) is a set of edges, and f \ E TZ 
is a real- valued weight function. If an edge exists for any pair of vertices, 
G is said to be complete. A spanning tree of G is a connected graph 
Gst = (P, Esu <!>) such that Est ^ E and \Est\ = \y\ — l(this means that 
Gst has no cycle). An MST of a graph G is a spanning tree of G in 
which the sum of weights in the tree {W = "^eeEst ^(^)) minimal. 
The problem of constructing an MST of a given edge weighted connected 
graph is called the pro&Zem(Tarjan, 1983). 

When the weight function 0 of G is metric.^ i.e., 0 satisfies the metric 
postulate (positivity, symmetricity and triangle inequality), G is called 
a metric graph. 

2.2. GREEDY METHOD 

The greedy method is a simple generic technique to solve the MST 
problem(Tarjan, 1983). We summarize this method from the view point 
of clustering here. 

The method starts with the disjoint set of clusters, which contains 
each object in an individual cluster. Each cluster is associated with a 
set of edges, called its front line^ and each edge in a front line connects 
a vertex inside the cluster with a vertex outside the cluster. A safe 
edge is one of the shortest edges(edges with minimal distance) in the 
front line. Then two clusters connected by a safe edge are successively 
merged until only one cluster remains. Exactly 1^1 — 1 edges are used to 
merge clusters and they construct an MST. Note that MST construction 
process can be viewed as a process of hierarchical agglomerative, single- 
linkage clustering. By specifying a procedure to choose two clusters to 
be merged in each step, a specific variation, such as Prim’s and Kruskal’s 
method, is derived. 
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Assuming that time complexity of a single distance calculation is 0(1), 
the gross time complexity of the greedy method is 0(|Fp log |V|), when 
applied to a complete graph. \V\^ part of this order is due to the fact that 
the number of edges in a complete graph with |F| vertices is ^ 

Thus, the total cost of MST construction becomes prohibitive when a 
classical method is naively applied to a complete graph with an expensive 
distance function. 

2.3. OUR APPROACH 

To address the problem pointed out so far, our work aims to reduce 
the number of expensive distance calculations needed to construct an 
MST of a metric complete graph. 

Our approach is founded upon the presumption that: 

ideally, only edges contained in the constructed MST are needed for the 
MST construction. 

In other words, edges not contained in the constructed MST need not to 
be touched in the course of the MST construction. However, in reality, 
we have to examine other edges to ensure that edges at hand are really 
the edges to be contained in the final MST. 

Since edges needed to construct an MST are safe edges., longer edges (edges 
with larger distances) are less probable to be selected from front lines in 
each step. Therefore, some longer edges could be ignored and distance 
calculations for them could be omitted in the course of MST construc- 
tion. We expect that metric indices enable us to exclude longer edges 
from front lines and to reduce the number of distance calculations needed 
for MST construction. 

Metric Indices and Near Neighbour Searches. 

The point of our approach to reduction of the number of distance 
calculations is to reduce the size of front lines by excluding longer edges 
from them. This can be realized by utilizing a metric index. Several 
metric indices have been proposed for near neighbour searches in arbi- 
trary metric spaces(Tolga et ah, 1997, Brin, 1985, Ciaccia et ah, 1997, 
Ishikawa et ah, 1999) and they support r-neighbour searches. 

Definition 1 (query region of an object q and radius r) Given a 
query object q (€ V) and radius r(> 0)^ r-neighbour of q is called the 
query region of q and r. 

Definition 2 (r-neighbour search) 

Given a query object q (G V) and radius r(> 0), r-neighbour search of 
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q with radius r retrieves objects inside the query region of q and r, i.e., 
{v I V /\ (j){q^ v) < r}. 

All existing metric indices utilize the triangle inequality to filter out 
objects outside the query region of q and r. However, such filtering is 
not sufficient because some objects outside the query region can not be 
filtered out. To remove such objects, distances between the query object 
and objects passing through the filter have to be calculated. Almost all 
distance calculations in a metric index are performed for this refinement 
purpose. 

3. MST CONSTRUCTION WITH METRIC 
INDEX 

In the classical greedy method described in 2.2, each cluster has a front 
line and the front line of each cluster contains every edge connecting a 
vertex inside the cluster and a vertex outside the cluster. However, all 
edges except the shortest one in each front line can be omitted, since 
only the shortest edge is needed to merge clusters in each step. 

We aim to exclude such edges from front lines by using a metric index. 
Figure 2 shows the framework of our method obtained from the classical 
framework by modifying the following points: 

1 initially, front lines contain no edge (line 5), and 

2 front lines are updated after cluster selection (line 12). 

In the method C.updateFrontLineO invoked at line 12, near neigh- 
bour searches by a metric index are performed to retrieve short edges 
to ensure that the shortest edges are contained in the front line. This 
is, so to speak, search on demand strategy. Notice that, as MST con- 
struction process proceed, near neighbour searches centered at the same 
vertices may be performed repeatedly by increasing the radii stepwise 
for updating front lines. 

Performance. 

Two factors play important roles in the performance analysis. One 
is the number of edges to be contained in front lines, and the other is 
the number of near neighbour searches performed to update front lines. 
The former affects the number of distance calculations and it is strongly 
dependent on the performance of a metric index used, while the latter is 
principally dependent on the algorithm of the method P . selectCluster () 
Thus, P . selectCluster 0 determines the characteristics of each vari- 
ations. More precisely, the algorithm of P.selectClusterO determines 
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F - Vertices of the graph 

P - Disjoint set of clusters (partition) 

C - A cluster (pair of a set of vertices Cv and an associated 

front line Cf) 

Cy - Vertices in a cluster 

Cf - Front line of a cluster (edges between Cy and V\Cy) 
e - An edge (triplet of two vertices u, v and 0(u,v)) 

MST - Set of edges constructing an MST 

1: // Initialization of the partition 
2: P ^ 0 

3: for all u G V do 

4: Cy ^ {u} 

5: Cf ^ ^ ! I initially, front line is empty 

6: P.append((Cv, C'f)) 

7: end for 

8: // MST construction by merging clusters 
9: MST ^ 0 

10: while |MPT| < |V| - 1 do 
11: (Cy,CF) ^ C PselectCluster 0 

12: C'.updateFrontLine() // front line update 

13: (li, V, dist) ^ e CfT indSaf eEdge() 

14: MPr.append(e) 

15: PmergeClusters(e) 

16: end while 
\I7: return MST 



Figure 2 Framework for Our Method 



the sequence of clusters which governs the principal characteristics of 
each variations. 

In the process of MST construction, the following steps are repeated 
exactly |F| — 1 times: 

1 select a cluster Ci from the partition P (by P . selectCluster ()), 

2 find a safe edge from the front line of Q (by Cp.f indSaf eEdge()), 

3 merge two clusters connected by ei (by P.mergeClusters(ei)). 

We get a sequence of selected clusters 

Co, Cl, C 2 , •••, C|i/|_2. 

Since near neighbour searches are possibly performed for every vertex 
in the selected cluster in the method C.updateFrontLineO, the total 
number of near neighbour searches performed (let it be S'w) is propor- 
tional to the sum of numbers of vertices in the selected clusters, i.e., 

^ El~lr" 10.1. 
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4. THE METRIC MATRIX 

Existing metric indices cannot avoid duplication of distance calcu- 
lations for the same edges in the course of MST construction. Hence 
the total number of distance calculations could not be reduced in spite 
of the reduction of the number of edges for which distances should be 
calculated. 

What we really want is to invoke successive near neighbour range 
searches centered at the same query objects (see Figure 3). In order to 




Figure 3 Radius-Incremental Near Neighbour Range Searches 



avoid duplication of distance calculations in successive near neighbour 
range searches, we introduce a new metric index named metric matrix. 
The metric matrix supports the following searches. 

Definition 3 (r-nearby search) 

Given a query object q (G V) and radius r(> 0), r-nearby search of q 
with radius r retrieves a set of objects, which includes the objects inside 
the query region of q and r. In other words, objects retrieved by an 
r-nearby search includes the objects retrieved by an r-neighbour search 
with the same parameters. 

Definition 4 (differential r-nearby search) 

Given a query object q (G V) and radius ri^\(> 0), a differential r- 
nearby search retrieves a set of objects, which includes the objects inside 
the query region of q and n_^i. If a differential r-nearby search of q with 
radius r^+i is preceded by a r-nearby search of q with radius ri (< n+ij, 
it never retrieve the objects already retrieved by the preceded search with 
radius ri . 

By using differential r-nearby searches for successive near neighbour 
searches in the course of MST construction, we can completely elimi- 
nate duplications of distance calculations. 
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Every existing metric index performs the following steps in the process 
of r-neighbour searches: 

1 candidate selection (pruning or filtering) and 

2 candidate refinement (remove objects outside the query region). 

The first step is principally accomplished by using the triangle inequality. 
The second step eliminates objects residing outside the query region. 
In the second step, distances between the query object and candidate 
objects must be calculated. On the other hand, the metric matrix skips 
the second step, thus it needs distance calculations only in the first step. 
This is the reason why the set of object retrieved by an r-nearby search 
can contain objects outside the query region. 

A metric matrix consists of several metric vectors. We call the number 
of metric vectors in a metric matrix the degree of it. Every metric vector 
is associated with its reference object. A metric vector is a vector of 
distances from the reference object to the other objects, and it is sorted 
in the ascending order. Several information related to each object, such 
as object IDs or radii specified so far, are also associated with a metric 
vector. 

Let vp be the reference object of a metric vector, ^ be a query object, 
and R be the distance between them. Then if an object u is inside the 
query region of q and radius r, the distance between u and vp should 
hold the following inequalities: 

R — r < (f){vp^ u) < R + r. 

We call this range as a candidate range. In a metric vector a continuous 
region corresponds to a candidate range, and it is called a candidate 
region. A candidate region in a metric vector serves as a filter for 
the candidate selection, because objects inside a query region must be 
located in it. A metric matrix can be viewed as a collection of such filters. 
A set of objects retrieved by an r-nearby search is obtained by taking 
intersection of objects located in candidate regions in metric vectors. 

5. EXPERIMENTAL RESULTS 

Experiments are based on synthetic data sets. Data sets are pre- 
pared by the procedure described in (Jain et ah, 1988) which generates 
normally distributed clusters in a d-dimensional vector space. In our 
experiments, data space is two-dimensional, clusters are equally sized 
and variance is = 0.1, and clusters’ centers are uniformly distributed. 
Distance is evaluated using the L 2 metric, i.e. (I){u^v) = L 2 {u^v) — 
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where d is the dimension of the data space. The 
degree of metric matrix is log 2 \ V\ for a data set of size \V\, 

Experiment had been done by two variations derived from our mehtod. 
They are named nnPrim and nnSmallestFirst, respectively Figure 4 
shows the result. Notice that a complete graph with 20,000 vertices 
has 199,990,000 edges, hence our method needs less number of distance 
calculations than 0.2% of distance calculations needed by the classical 
method. 




Figure 4 Experimental Results 



6. SUMMARY 

We proposed an approach for MST construction for large complete 
metric graphs utilizing a metric index. Using a metric index, we aimed to 
reduce the number of expensive distance calculations needed to construct 
an MST. To fulfill our purpose, we also invented a new metric index 
named metric matrix which supports differential r-nearby searches. 

Applying the approach to the classical generic greedy method, we 
obtained a generic method for metric graphs. The experiments show 
that the proposed method reduced the number of distance calculations 
significantly. 

It is worth mentioning that density information can be gathered in 
the process of our method, since near neighbour searches for vertices are 
performed. Remind that the process of MST construction can be viewed 
as a process of single-linkage clustering. Hence the clusters obtained 
from an MST may incur the defects of the single-linkage clusterings. By 
applying the criteria of density based clustering(Ester et ah, 1996), the 
resultant clusters might be enhanced. 
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Abstract In client-server based interactive information visualization, user interfaces data 
sets of many types are made accessible by a server. On the server side, these 
data sets can, e.g., be stored in a file system. As soon as a user requests the 
data set, it will be transmitted from the server to the requesting client. A 
different approach is to dynamically construct a data set at request time on the 
server side for visualization on the client side. Another approach for database 
driven information visualization is to request actual data out of a running 
information visualization application on the client side. Both visualization 
methods have in common that changes in the underlying data cannot be 
automatically reflected in changes of the client’s information visualization 
state and context. A solution for this problem is the use of database triggers 
which today are often implemented in existing database management systems 
(DBMS). Database triggers are a mechanism used to define a set of operations 
which are to be executed by the DBMS as soon as a given part of the data in 
the database changes. Using triggers to update an information visualization 
client has the advantage of eliminating unnecessary database requests. 
Moreover, the delay in updating the information visualization on the client 
side is only dependent on the time it takes to send a message from the server to 
the client. This paper describes a conceptual model for such a trigger 
mechanism and evaluates some upcoming problems. Furthermore, the paper 
describes a prototypical proof-of-concept implementation based on commonly 
used technologies and systems. 
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1. INTRODUCTION AND MOTIVATION 

In client-server-based interaetive information visualization user interfaces 
(see, e.g., [LHN99], [MLHN99], [RLHA98], [MSSNHPL98]), data sets of 
many types are made accessible by a server. Users request such data sets and 
client user interfaees visualize them applying a given information 
visualization mapping. On the server side, these data sets can, e.g., be stored 
in a file system. As soon as a user requests the data set, it will be transmitted 
from the server to the requesting client. A different approach is to 
dynamically construct a data set at request time on the server side for 
visualization on the client side. This approach is often used in database- 
driven information visualization applications. Upon receipt of users’ requests 
a program is executed on the server which retrieves the corresponding data 
from a database and generates a data set from it (and possibly some 
additional meta data) . 

A different approach for database-driven information visualization is to 
request actual data out of a running information visualization application on 
the client side. An information visualization application can therefore 
request data at an arbitrary point in time and ehange its information 
visualization state and informational context based on the incoming result 
data. 

Both visualization methods have in common that changes in the 
underlying data cannot be automatically reflected in changes of the client’s 
information visualization state and informational context. With dynamic 
construction at request time, it would be necessary to retrieve a completely 
new data set from the server every time the database content changes. With 
runtime access, one would need to trigger new requests from the 
visualization client. 

To overcome these limitations, it would be possible in general to let the 
client query the database for possible changes in regular intervals. This 
approach has the obvious drawback that a potentially large number of 
database queries would be executed without the underlying data ever having 
changed. This would in turn result in unnecessary system and network load. 
Furthermore the information visualization client would not be updated 
immediately but would have to wait until the next data retrieval. 

A better solution for this problem is the use of database triggers which, 
today, are often implemented in existing database management systems 
(DBMS) [WC96a]. Database triggers are a meehanism used to define a set of 
operations whieh are to be executed by the DBMS as soon as a given part of 




Automatic Updates of Interactive Information Visualization 343 

the data in the database ehanges. Using triggers to update an information 
visualization client has the advantage that no unnecessary database requests 
have to be made and that the delay in updating the information visualization 
on the client side is only dependent on the time it takes to send a message 
from the server to the client. The use of triggers for the notification of 
VRML scenes was already postulated in [VDBGW97]. 

This paper describes a conceptual model for such a trigger mechanism 
and evaluates some upcoming problems. Furthermore, it will describe a 
prototypical proof-of-concept implementation based on commonly used 
technologies and systems. 

The reader is assumed to have some basic knowledge in the area of 
DBMSs as can be gained from [EN94]. Furthermore, it is assumed that a 
basic understanding of VRML (see [IS097a]), the external authoring 
interface (short EAI [Mar97]), and the TCP network protocol [RFC793] is 
given. 



2 . ACTIVE DATABASE MANAGEMENT SYSTEMS 

Conventional DBMS are passive, which means they will only perform 
operations on data when explicitly queried by an application. No other 
operations are performed. Active DBMSs (ADBMS), however, are able to 
perform operations which are not explicitly defined by an application. A 
typical use for an ADBMS are operations which are performed as a direct 
reaction to modification operations (in SQL: Insert, Delete and Update) on 
the underlying data. (See descriptions of research prototypes in [WC96b]). 

The active behaviour of an ADBMS can be generally described by a so 
called event-condition-action rule (ECA rule). If an event of an ECA rule 
occurs, the condition is tested and the action will be performed if the 
condition evaluates true. 

Events can, for example, be changes on data or data queries but also be 
defined by some absolute point in time (e.g. 8/1/1999, 8:00 PM). Other 
events defined over time are described in [DBC96]. Furthermore, events can 
be defined by application programs. This could be achieved by registering an 
event under a given name. If the event occurs the application program calls 
the ADBMS referring to the given name. 

All events described so far are simple events. However, events can also 
be composite events, which means they are defined on the basis of other 
events. A composite event can e.g. be defined by two simple events which 
have to occur sequentially. A system which supports composite events is for 
example HiPAC [DBC96]. 
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The condition in an ECA rule is based on the internal state of the 
database, the event-related data, and other information accessible by the 
DBMS. Event-related data can be the new value and the old value of the 
changed data, e.g., in a data modification operation. 

In [WC96c] the possible actions of an ECA rule are classified as 

■ Data modification operations 

■ Data retrieval operations 

■ Other database commands 

■ Application procedures. 

In this context the case of application procedures shall be examined: An 
action can be defined by a call to a procedure defined in a programming 
language such as C. Therefore it is possible to implement a notification of 
systems external to the database system if the procedure sends data to the 
external system. 



3. COUPLING MODES 

In the following, it is assumed that an event always occurs inside of a 
transaction (this would not apply to time-based events) which is the only 
relevant case in the context of this paper. Within the execution of an ECA 
rule there are different possibilities for the evaluation of the condition and 
the execution of the action relative to the transaction which triggered the 
event of the ECA rule (called triggering transaction). These possibilities are 
called coupling modes. In this context we will shortly describe the coupling 
modes defined in the HiPAC project [DBC96] and two other coupling modes 
from [BBKZ93]. 

In particular the coupling modes are: immediate, deferred, detached, 
parallel causally dependent, sequential causally dependent, and exclusive 
causally dependent. These modes can describe the evaluation of the 
condition in relation to the causal transaction as well as the execution of the 
action relative to the transaction in which the condition is evaluated. For 
reasons of simplicity only the evaluation of the condition is considered. 

With immediate coupling the condition is evaluated immediately after the 
occurrence of the event inside the triggering transaetion. The normal 
execution of the triggering transaction is stopped in the meantime. With 
deferred coupling the condition is evaluated at transaction end (but before 
the commit) inside of the triggering transaction. With decoupled coupling a 
new transaction is started for the evaluation of the condition which runs in 
parallel and independent of the triggering transaction. Parallel causally 
dependent is the same as detached except that the new transaction can only 
terminate regularly if the triggering transaction is terminated regularly. With 
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sequential causally dependent a new transaction is started as in the two cases 
before. The execution start time, however, lies after the commit of the 
triggering transaction. That means, the new transaction is only started if the 
triggering transaction has committed. With exclusive causally dependent a 
new transaction is started for the evaluation of the condition which only 
reaches its commit-point if the triggering transaction is aborted. If the new 
transaction has already been started at the commit-point of the triggering 
transaction, it will be reset. 

In HiPAC, all coupling modes can either be assigned to a condition or an 
action of an ECA rule. In other ADBMSs (e.g. Ode [GJ96]) the coupling 
modes cannot be explicitly set. However, it is possible to achieve an 
equivalent execution of ECA rules by using composite events [GJ96]. 

For broader discussions on ADBMS, it is recommended to read [WC96b, 
Day95, DGG95, Buc94]. 



4. TRIGGERS IN SQL3 

This section shortly outlines how triggers are represented as described in 
a draft version of the SQL3 standard [IS097b]. The representation is based 
on an interpretation of the draft standard by the author. 

The general SQL3 syntax for a trigger is: 

<trigger definition> ::= 

CREATE TRIGGER <trigger name> 

<trigger action time> <trigger event> 

ON < table name> 

[REFERENCING <old or new values list> ] 
<triggered action> 

<triggered action> ::=[FOR EACH {ROW | 

STATEMENT}] 

[WHEN ( <search condition> ) ] 

<triggered sql statement> 

With the CREATE TRIGGER statement a new trigger with the name 
<trigger name> is defined in a database. <trigger event> and <table name> 
define which operation has to be performed on which table to release the 
trigger. <search condition> defines a condition which has to be true for 
<triggered sql statement> to be executed. FOR EACH ROW defines that the 
trigger is executed for each row and FOR EACH STATEMENT defines that 
the trigger is only executed once for the whole causal statement. With 
<trigger action time> set to BEFORE (AFTER) the trigger is executed 
before (after) the change in the table becomes permanent. With the 
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REFERENCING keyword, names can be defined for old and new table 
values which can be referenced in the <triggered action>. 

SQL3 triggers can be seen as ECA rules. The event is defined by the 
modifier operation INSERT, DELETE or UPDATE. <search condition> is 
equivalent to the condition and <triggered sql statement> is equivalent to the 
action. For this paper the coupling between event condition and action is 
relevant. The coupling mode between the triggering event and the evaluation 
of the condition is immediate. The coupling mode between the evaluation of 
the condition and the execution of the action is also immediate. In the next 
section, we will see an obvious drawback of using this coupling mode for 
notifying external systems with the help of database triggers. 



5. ACTIVE DBMS AND EXTERNAL SYSTEMS 

In [BBKZ93] the coupling mode sequential causally dependent has been 
introduced (described in Section 3) to be able to handle systems external to 
the database in which actions cannot be undone (which is characteristic for 
database driven client/server information visualization scenarios). Given that 
the action of an ECA rule has executed an operation on some external 
system in immediate mode the operation on the external system has to be 
reset if the triggering transaction is reset because the underlying event has 
effectively never occurred. However, if the external operation cannot be 
undone the operation has been executed even though the triggering 
transaction has been reset. An example for an external operation that cannot 
be undone is described in [BBKZ93] with the opening of a valve through 
which a fluid runs. This problem does not only occur with immediate 
coupling mode, but generally whenever an action with irreversible 
consequences is executed on an external system before the triggering 
transaction commits (that means also with the modes deferred and parallel 
causally dependent). 

With coupling mode set to sequential causally dependent the new 
transaction is started after the transaction which generated the new 
transaction has committed. Therefore it is ensured that the transaction from 
which the new transaction was started will never be reset after the new 
transaction has started. If, e.g., an action is executed with sequential causally 
dependent and the coupling for the condition is immediate and this action 
leads to an operation on an external system the external operation surely will 
not have to be reset. 

In [HCD98] the described problem (of notifying an external system 
before the triggering transaction has committed) is called dirty dependent 
operation problem (short DDO problem). Furthermore, [HCD98] describes a 
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software architecture in which application programs can register at the 
database for a certain event. As soon as the event occurs, the registered 
application is called. The DDO problem is overcome by delaying the 
notification of the application program about an occurring event until after 
the triggering transaction has committed. 



6. COUPLING OF TRIGGERS WITH INFORMATION 
VISUALIZATION APPLICATIONS 



6.1 Requirements 

The basic problem has already been mentioned in the introduction and 
will be discussed here in more detail. A data set requested from a server is 
visualized across several client applications via some visualization 
component (e.g. a browser plug-in). The set visualizes information carried 
by data which is permanently kept in a database. The data is either read 
directly from the clients or the server reads from the database upon receipt of 
a request and then generates a file for an information visualization based on 
the received data. Figure 1 outlines this scenario with both variants for 
database driven information visualization. 




Figure 1: Environment of a system for the notification of information visualization clients 
through database triggers 

Besides the visualization clients there are other clients which access the 
DBMS. Our goal is to have the database triggers notify the visualization 
clients about changes in the database in order to update the actual 
information visualization on the client side. This should also work with data 




348 VISUAL DATABASE SYSTEMS 



modifications caused by existing software systems which have not yet been 
used in such a scenario. 

The following requirements have been considered in the upcoming 
development of the architectural model: 

■ The system should not depend on specific features of the 
visualization clients 

■ The system should not depend on specific features of the 
underlying database system. 

The first requirement (not to use client application specific features) is to 
ensure a broad usability across different client types. The second 
requirement is to make sure that the developed solution is easy to adapt to 
different DBMS systems. 

6.2 Trigger and External Systems 

In this section, we will examine how database triggers can be used to 
notify external systems about modifications in a database. It is assumed that 
the actions triggered by a notification of an external system cannot be 
reversed. This is in particular the case with information visualization 
applications. Information visualization applications typically wait for events 
either triggered by direct user interaction or from outside sources. As soon as 
the application handles a specific event, it modifies its visualization state 
which, in turn, results directly in a change of the screen shown to the user. 
Because the user perceives such changes immediately, they cannot be 
reversed afterward. 

The results of this section should be applicable to SQL3, since SQL3 
triggers are executed immediately within the triggering transaction (see 
Section 4). It is furthermore assumed that it is possible to notify an external 
system from the action part of a trigger in a given DBMS. We will explain 
later in detail how this is achieved in our prototypical implementation. 

For now, there is the following problem: Sinee an action is executed 
within the triggering transaction, it is possible that a notification of the 
external system takes place although the transaction is reset afterwards. 
Because effectively no modification has occurred in the database there 
should be no notification of the external system, save we are interested in the 
information that a modification attempt has taken place (Note: This case is 
not considered further in this paper). This problem was already described 
generally for aetive databases above. Therefore it does not make sense to 
notify an external system about database modifications in the action of a 
trigger and consequently update a visualization on a user display because the 
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modification has possibly never taken place. This problem can be tackled 
through at least two approaches: 

■ If the external system is interested in updating data from a database 
which is kept locally, an additional software component can be 
inserted between the DBMS and the external system which is 
notified by the action part of the trigger. The component interprets 
the notification to the effect that the data in the database which is 
mirrored locally has possibly changed. The relevant data is then read 
anew by a database query and sent to the external system. 

■ An additional software component is introduced which uses a 
database query to test upon notification by a trigger if the triggering 
transaction has terminated regularly. Only if the test is positive, the 
external system will be notified. 

Both variants will be examined in more detail. It is furthermore assumed 
that applications which modify the database will reset the transaction via 
ROLLBACK if the modification is erroneous. 

6.2.1 Updating Data on Possible Changes 

Figure 2 is displays the general architecture of a system which queries a 
database (2) upon notification by a trigger (1). The results of the query (3) 
are then sent to the external system (4). 



(4) query resullt 

◄ 



Figure 2: Architecture of a system in which data on an external system is updated as a 
reaction to a trigger event in a database. 

The query assumes a newly introduced middleware component. An 
example for an external system could be an application which shows a stock 
index. The middleware component assures that the value shown on the 
screen always matches the actual value in the database. 

If the DBMS keeps its write locks until the end of the transaction, the 
transaction which performs the query as a result of the notification by the 
trigger safely reads the modified values after the triggering transaction has 
ended because it has to wait until the triggering transaction releases its write 
locks. If the triggering transaction terminates regularly, the modified data is 
read. If the triggering transaction resets, the old unmodified data values are 
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read again. But this is no problem since, in this case, the old values are still 
up to date. Only the read process costs some time. 

In both cases only persistently stored data is read from the database. The 
problem that data which has not yet been stored permanently in the database 
is transmitted to an external system does not exist any more. 

It is possible that the data is modified by another transaction before it is 
read by the middleware component. In this case the data modified by this 
other transaction is read immediately. The notification through this other 
transaction then leads to the same data being read again which is equivalent 
to the case of the rolled back transaction. In the worst case, we have an 
unnecessary database access. 

6.2.2 Checking if the Triggering Transaction has Terminated regularly 

The general architecture of a system which checks if the triggering 
transaction has terminated regularly after the action part of the trigger has 
been executed is displayed in Figure 3. 



(4) message 

◄ 



Figure 3: Architecture of a system for the notification of an external system through a 
database trigger which checks if the triggering transaction terminates regularly. 

As in the previous case it involves a middleware component between the 
DBMS and the external system. If the action part of a trigger notifies the 
middleware component (1), the component performs a test query on the 
database (2) to test whether the triggering transaction has terminated 
regularly. If the test is positive (3) the external system is notified (4). How 
this can be done in detail is outlined by the following example of an 
UPDATE trigger. 

It is assumed that the following tables are stored in a database: 
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Triggering(N| Ti, . . . , N„ T„) 

Counter (c INTEGER) 

The table Triggering is a table on which a trigger is to be defined. Its 
columns have the names Ni, ... , N„and the data types Ti ,..., T„ . The table 
Counter is an auxiliary table which holds only one row containing a single 
value. A trigger shall be defined on the table Triggering as follows: 
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CREATE TRIGGER Update_Triggering 
UPDATE ON Triggering 
AFTER ( 

EXECUTE PROCEDURE inc_counter ( ) , 

EXECUTE PROCEDURE not if y_component ( ) ) ; 

The procedure inc_counter() increments the value stored in the Counter 
table by one. If the maximum integer value given by the system is reached, it 
proceeds with the smallest possible value. The procedure 
notify_component() notifies the middleware component. Again we assume 
that the DBMS keeps the write locks until the end of a transaction. 

The middleware component has an internal counter which is initialized to 
0 in the beginning. The value in the table Counter is also set to 0 by the 
middleware component. Now if rows in the table Triggering are modified 
the action part of the trigger is executed. The value of the table Counter is 
incremented by 1 and the middleware component is notified. In the 
middleware component the table counter is read and compared to the internal 
counter value. If the value of the table Counter is not equal to the value of 
the internal counter, the external system is notified and the internal counter is 
incremented by 1 . Of course, it is assumed that both counters have the same 
range for their values and that the internal counter also proceeds with the 
minimal value if the maximum value is exceeded. If, however, the internal 
counter is equal to the value of the table Counter, the external system is not 
notified. If the middleware component receives notifications which cannot 
be processed immediately, they are buffered. 

With this approach it is assured that a notification is only performed after 
the triggering transaction has committed and therefore only if the data 
modification that caused the trigger event has been stored permanently in the 
database. The reason for this can be explained as follows: 

The triggering transaction occupies the value in the table Counter with a 
write lock until the end of the transaction because the action part of the 
trigger increments the value in the table Counter. The transaction with which 
the middleware component reads the table Counter has to wait until the 
triggering transaction has ended. At the point of commit, the incremented 
value remains in the database table, which means its value is unequal to the 
value of the components internal counter. The externals system is notified as 
a result. If, however, the triggering transaction is reset, the increment of the 
value in the table Counter is undone and the external system is not notified 
because the internal counter value is equal to the table Counter value. It is 
also possible that another transaction releases the trigger after the triggering 
transaction but before the value of the table Counter is read by the 
middleware component. Because the difference between both counters is 
equivalent to the number of triggering transactions which have terminated 




352 VISUAL DATABASE SYSTEMS 

regularly but have not yet led to a notification of the external system, the 
external system is not notified too often. It is assumed that the value in the 
table Counter never hurries too far ahead of the internal counter such that 
both counters become equal again. 

With this approach, it should even be possible to send the data which is 
made available by the trigger to the external system. If the database trigger is 
generated with the FOR EACH ROW keywords, it is possible to access the 
old and new data row values within the triggers action part. If these values 
are temporarily stored in a table or sent with the notification procedure a 
transfer to the external system should be possible. This approach is not 
further considered in this paper. 

6.3 Distribution of Messages 

In the previous section, it was implicitly assumed that there is a fixed 
external system which is notified if the action part of a trigger is executed. 
However, a client-server based system for the notification of information 
visualization clients has to notify every client that visually displays a local 
set of data. It is generally not known in advance which clients will visualize 
a given portion of data because the data can be requested from the server by 
an arbitrary client machine. 




Figure 4: Distribution of messages across clients 

For the distribution of messages to the client there could be a special 
system component. In Figure 4 the general architecture is displayed. If the 
distribution component receives a message (1), it is sent through to the 
clients (2). 

At this time it is not described in detail how the distribution component 
knows which clients to notify and which protocol to use. 
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6.4 Discussion of Architectures 

This section describes possible architectures of a system for the notification 
of information visualization clients through database triggers. The different 
possibilities result from a combination of the general architectures depicted 
in Figure 2 and Figure 3 with the partial architecture for the distribution of 
messages as depicted in Figure 4. Furthermore, the possibility of accessing a 
database directly from an information visualization client is considered. 




Figure 5: Architecture of a system in which information visualization clients are updated as a 
result of a trigger being executed. 

Figure 5 and Figure 6 display architectures of systems which are based 
on the architecture of Figure 2. In the architecture shown in Figure 5, the 
functionality of the middleware component is fully performed by the 
information visualization clients. If the action part of a trigger notifies the 
clients (2) via the distribution component (1) that a data modification has 
possibly taken place, the clients perform database queries (3) and receive the 
results (4). Dependent on the query results, the clients update their visual 
state. Thereby it is absolutely possible that different clients send different 
queries to the database system. In a scenario in which the same information 
visualization application runs across several clients, this can, e.g., happen if 
through user interaction with the application, different parts of the overall 
database are to be visualized on different clients and therefore different 
queries have to be run against the database. 

In Figure 6 though it is assumed that all clients visualize the same content 
partition of the database. The middleware component works as described 
before in 6.3. The distribution component performs the distribution of the 
data to the clients. The advantage of this architecture over the previous 
architecture is that only one database query has to be performed to update 
the information visualization clients, which clearly reduces the load for the 
DBMS. If, in the previous architecture (of Figure 5) all clients performed the 
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same database query against the DBMS, the database would execute the 
same query multiple times which would also generate unnecessary workload. 




Figure 6: Architecture of a system in which a trigger causes a database query to be used to 
update data on several information visualization clients. 

The last architecture described is a combination of the partial 
architectures depicted in Figure 3 and Figure 4. This architecture is shown in 
Figure 7. The action part of a trigger notifies the middleware component (1) 
which now checks if the triggering transaction has terminated regularly (2). 
If this is the case, the distribution component is notified (4). The distribution 
component then notifies the visualization clients (5). Only with this 
architecture a system in which actions are performed on the visualization 
clients in case a transaction has terminated regularly can be implemented. 




Figure 7: Architecture of a system in which the regular termination of the triggering 
transaction is tested. 
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7. A SYSTEM FOR THE NOTIFICATION OF VRML- 
WORLDS THROUGH DBMS TRIGGERS 

In this section, we describe a proof-of-concept prototypical 
implementation of a system for the notification of several VRML worlds 
running on client machines about modifications in a DBMS. The presented 
system is based on the observations in earlier sections and on the 
architecture in Figure 5 in which clients are notified about a possible change 
in the underlying database. The clients then read from the database and 
update their display based on the retrieved data. 

The DBMS used in this specific implementation is the Informix Dynamic 
Server (IDS) which is a widely used relational database system. However, 
the implementation techniques used in the given scenario can be applied to 
most existing relational DBMS, available on the market today. The IDS 
implements the database trigger mechanism compliant to the SQL3 draft 
standard [IS097b] and is therefore well suited for the given purpose. 

Furthermore, we have used the standard web browser Microsoft Internet 
Explorer 4.01 and the VRML viewer control WorldView 2.1 for Internet 
Explorer from Intervista with the recommended JSAI patch installed as 
VRML-based information visualization clients. The operating system 
environment is Microsoft Windows NT 4.0 SP3. However, to assure a broad 
usability, no special browser features were used which exceed the VRML 
standard [IS097a] and the EAI [Mar97]. 

7.1 External Sensors 

If we want to notify a running VRML world through triggers there has to 
be a possibility to determine how the notification affects the VRML scene. 
We have chosen a solution which introduces a new VRML node type called 
ExtemalSensor. This node provides VRML events through its EventOut 
fields when notified by a database trigger. However, this type of node can in 
fact be notified by any type of external system. 

A ExtemalSensor node in a VRML scene forms an interface to a so- 
called external sensor. An external sensor is an entity external to the VRML 
world which has a given name, an interface, and a behavior. The external 
sensors interface defines the data types for the values which can be sent to a 
VRML scene. An external sensor is typically a part of a mnning sensor- 
server process. A sensor server is a program to which VRML clients can log 
on and off. 

The VRML syntax of the ExtemalSensor node is as follows: 

ExtemalSensor { 

eventin SFBool set enabled 
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field 


SFBool 


enabled 


TRUE 


eventOut 


SFBool 


enabled_changed 




field 


SFString 


sensorld 


tf M 


field 


SFString 


server 


II II 


field 


SFInt32 


port 


22155 


field 


MFNode 


eventOut s 


[] 



] 

For the design of the ExtemalSensor node the given sensor nodes from 
[IS097a] were examined and the widely used ExposedField enabled was 
used. (In our case split up into an eventin, a field, and a eventOut die to 
implementation issues). The eventin set_enabled can be used for log on/off 
at a given external sensor. Sending TRUE (FALSE) logs on (off) to the 
external sensor. Within a log on the value TRUE is immediately available at 
the eventOut enabled _changed, which potentially occurs prior to the logon 
to the external sensor. Nodes which are connected to enabled_changed via 
ROUTE or nodes in the field EventOuts cannot be sure to receive all events 
from an external sensor after the receipt of TRUE from the event _changed 
field. That means notification messages can be lost in the beginning. As soon 
as FALSE is sent to set_enabled, the transmission of events through the 
EventOuts field is immediately stopped, i.e., one does not wait until the log 
off process at the external sensor has finished. 

Unlike the built-in VRML sensors, which have a clearly defined 
functionality, the node type ExtemalSensor can be connected to different 
external sensors via the field sensorlD. The value of the field sensorlD is the 
unique name of an external sensor. Since external sensors are provided by 
sensor server it has to be determined on which server machine and on which 
port the server can be reached. This is achieved through the fields server and 
port where the value of server is the Internet address of the server and the 
value of port is the port number on which the sensor server is listening. The 
standard port number is 22155 which should not be occupied by other 
services (e.g., a web server) [Tan96]. 

The interface of an ExtemalSensor node to match the interface of its 
external sensor is made available through the field EventOuts. This field 
contains nodes which in turn contain one single eventOut field. Therefore we 
have defined a node type named EventOutT for every existing basic VRML 
node type T which contains an exposedField named value of the type T. As 
an example we give the syntax for the node type EventOutSFFloat: 

EventOutSFFloat { 

exposedField SFString id 
exposedField SFFloat value 0 

} 
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For each pair (N, T) from the interface of an external sensor a node with 
the type EventOutT with an id value of N can be inserted in the EventOuts 
field. An exposedField value provides an event if a message of interest is 
received from the external sensor. This is the case if a message contains a 
pair (N,W) in which N is the same string as in the nodes id field. If the 
interface of the external sensor is known, an appropriate node can be inserted 
to the interface in the field EventOuts. Given that sensor-id is the name of a 
sensor, server-address is the address of a sensor server, server-port is the port 
number, and {(Ni Ti, ... , Nn Tn)} is the interface of the sensor, then a node 
of the type ExtemalSensor can be used as follows: 

ExternalSensor { 

enabled enabled-start-state 
sensorld '' sensor-id" 
server " server- address" 
port server-port 
eventOuts [ 

DEF idi EventOut Ti {id "Ni"} 

DEF idn EventOut Tn (id "Nn"} 

] 

} 

ROUTE idi.value TO ... 

ROUTE id„. value TO ... 

enabled-start-state determines if the node should be activated at the 
beginning. idi,...,idn are arbitrary identifiers. Note that not every possible 
pair of an external sensor interface has to have a corresponding node in the 
in EventOuts. It is enough to list the nodes which are really used in a VRML 
scene. 

An example for the use of an ExtemalSensor node is given below. Three 
temperatures from tree different locations A, B, and C are being visualized 
in a VRML scene. The sensor corresponding is named “temperature”, the 
server is named “tempserver”, and the port number is 22155. The exact 
visualization is not further detailed here but the temperatures are made 
available as three eventOut fields with the type SFFloat. 

The interfaee of the external sensor is as follows: 

t Sensor temperatur j| 

I Name n Typ ii 

jj temperatur^A i SFFloat | 
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If temperatur_B ii SFFloat | 

I temperatur_C ;| SFFloat | 

The external sensor is used in the VRML seene as follows: 

EXTERNPROTO ExternalSensor [ 
eventin SFBool set_enabled 

field SFBool enabled 

eventOut SFBool enabled_changed 
field SFString sensorld 

field SFString server 

field SFInt32 port 

field MFNode eventOut s 

] 

"http : //tempserver/nodes . wrl#ExternalSensor " 

EXTERNPROTO EventOutSFFloat [ 
exposedField SFString id 
exposedField SFFloat value 



"http : //tempserver/nodes . wrl#EventOutSFFloat 

M 



ExternalSensor { 

sensorld "temperature" 
server "tempserver" 
port 22155 
eventOuts [ 

DEF a EventOutSFFloat {id 
"temperature_A" } 

DEF b EventOutSFFloat (id 
" temperature_B" } 

DEF c EventOutSFFloat (id 
" temperature_C" } 

] 

} 



ROUTE a. value TO . . . 
ROUTE b . value TO . . . 
ROUTE c. value TO . . . 



7.2 Communication Protocol 

The communication between VRML nodes of the type ExternalSensor 
and an external system is handled by a protocol which is based on the well- 
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known transmission control protocol (TCP). As soon as a VRML client 
wants to communicate with a sensor server, it establishes a TCP connection 
to the server and communicates with it via defined messages. If a VRML 
client wants to be notified about certain events (which can be different for 
every sensor) after registering with a sensor server, it has to keep the 
connection to the server open because the connection is used for the 
notification. 

The messages are text-based and coded using ISO-8859-1 [IS087]. Possible 
messages include requests for information about an external sensor, 
registering/unregistering with a sensor, sending data to a VRML client, and 
closing the connection. The exact syntax of the message format is not 
covered within the scope of this paper. 

An example for communication messages based on this protocol are given 
below. It is based on the example in section 7.1: 

Register with the temperature server for the external sensor temperature: 
REGISTER temperature # 

Unregister with the temperature server for the external 
sensor temperature: 

UNREGISTER temperature # 

Request the interface of the sensor temperature: 

GETINFO temperature # 

Temperature server sends the interface of the external sensor temperature: 

SENSORINFO temperature 
SFFloat temperature_A 
SFFloat temperature_B 
SFFloat temperature_C # 

Temperature at location A has changed: 

EVENT temperature 
temperature_A 22.5 # 

All temperatures have changed: 

EVENT temperature 
temperature_A 22.5 
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temperature_B 27.2 
temperature_C 19.8 # 

7.3 Trigger Server 

The trigger server we have implemented for the notifieation of VRML 
scenes through database triggers can be looked at as a special case of a 
sensor server. The trigger server is configured through a file which stores 
which trigger events are permitted from which machines. It is also possible 
to determine that a trigger event can come from any machine. With this 
feature, we can assure that no events can be faked and sent from machines 
other than the database server. The file format is very simple: A 
configuration file contains a list of names of trigger servers followed by an 
internet address or *. The character * is a wildcard for all machines. An 
example for a configuration file in which two trigger sensors are defined 
follows: 

new_person * 

changed_temerature sensormachine 

A new trigger sensor new_person is defined which accepts trigger events 
from an arbitrary machine are accepted. The new trigger sensor 
changed_temperature, however, only accepts messages from the machine 
with the address “sensormachine”. 

Note: If the VRML browser component only allows network connections 
to the machine from which the VRML scene was downloaded, as is the case 
with InterVista’s WorldView 2.1, the trigger server must too run on this 
machine. 

7.4 Notification Procedure 

The procedure notify is defined and stored as part of an IDS database. 
This procedure is used to inform a trigger server within the action part of a 
database trigger in a system containing a database, a trigger server, and 
VRML clients. It has the following appearance: 

Notify (server_address LVARCHAR, port INT, 

sensor id LVARCHAR) 



The procedure notifies a trigger server with the address trigger _server 
running on port port about the occurrence of an event within the sensor 
sensor Jd. The trigger server is therefore instructed to inform all VRML 
clients which are registered for the sensor sensor _id. 
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7.5 Implementation Architecture 

The overall implementation arehitecture which uses the components 
described above to notify VRML worlds about changes in a database is 
displayed in Figure 8. The system contains VRML worlds, a trigger server, 
and a DBMS. The files containing the VRML world are requested from a 
web server which is not explicitly depicted in the figure because it does not 
participate in the notification process. The VRML worlds register with a 
trigger server to be notified by a certain trigger. A registration is confirmed 
by the trigger server to inform a VRML world of the exact point in time at 
which it surely begins to receive all trigger events from the database system. 
As a reaction, the VRML world can start a database query to read its data 
and start the visualization process. Thereafter the data can be read anew each 
time a trigger event occurs. 




Figure 8: Implementation architecture which uses the described components to notify VRML 
worlds about changes in a database 

If a VRML world does not want to receive any more trigger events, it can 
unregister with the trigger server. A VRML world can therefore register and 
unregister multiple times with a trigger server. 

If the action of the trigger is executed, the trigger server is notified via 
the notifyO procedure. The trigger server then notifies all VRML worlds 
registered for this event. 



8. CONCLUSIONS AND OUTLOOK 

We have outlined several possibilities of how a system could be 
structured in which database triggers are used to update information 
visualization client applications on changes of the underlying database. 
Furthermore, we have described components which have been designed and 
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prototypically implemented. These components can be used in the 
development of a system in which VRML scenes are notified through 
database triggers. The use of these components should dramatically ease the 
development process in such a scenario. In particular, a server has been 
developed at which VRML clients can log on to be informed about possible 
changes in a database. A procedure which is stored in an Informix Dynamic 
Server DBMS has been developed. This procedure allows triggers stored in a 
database to notify the server. Furthermore, a VRML node type has been 
developed that can be used in arbitrary VRML scenes which are interested in 
database trigger notification. This node type performs all necessary actions 
to log on to the server component and is also responsible for the transmission 
of VRML events as a reaction to trigger notification. The new node type can 
even be used in a more general way. Notifications coming from arbitrary 
systems in the internet can be included into a VRML scene using this node. 
Therefore a protocol has been developed that handles the communication 
between these systems and the VRML scene. The reason for the universality 
of our solution for connecting VRML clients to external systems is that we 
wanted to additionally support potentially complex scenarios containing 
trigger usage or general active database systems for the update of VRML 
scenes. 

In many areas more work has to be done. In the following some of these 
tasks are enumerated: 

■ Up to now the developed components have only been tested 
against very simple scenarios. The components clearly have to be 
tested in real world application environments to gain more 
detailed insight about their robustness and usability. 

■ Only a simple notification mechanism has been implemented: In a 
message no information about the modified data is transferred to 
the VRML clients. Furthermore, it is not assured that a 
modification has really taken place. A system as described in 
Section 6.4, in which an external component tests the database if 
the triggering transaction has terminated regularly, could be 
implemented. 

■ TCP based protocols have been used for the communication 
between the components. It still remains to be examined if there is 
another protocol which has clear advantages over TCP in this 
environment. Particularly multicasting could be used to inform 
the clients because all clients interested in a particular trigger 
notification recieve the same message upon trigger execution. 

■ It has to be analyzed how a system based on the developed 
components reacts to a failure of one or more system components. 
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The results of such an analysis could directly flow into the further 
development of the components 

■ The general performance of systems based on the described 
components could be analyzed. For example, the time between 
the triggering database query and the actual screen update on the 
client could be measured. Furthermore it could be analyzed how 
many clients can be supported in parallel by the trigger server 
component. With these results the scope for the usage of the 
components could be determined more accurately. 
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Abstract One of the obstacles that hinder trigger systems from their wide deploy- 
ment is the lack of tools that aid users to create trigger rules. Similar 
to understanding and specifying database queries in SQL3, it is difficult 
to visualize the meaning of trigger rules. Furthermore, it is even more 
difficult to write trigger rules using such text-based trigger rule language 
as SQL3. In this paper, we propose TBE (Trigger-By-Example) to rem- 
edy such problems in writing trigger rules by using QBE (Query-By- 
Example) ideas. TBE is a graphical trigger rule specification language 
and system to help the users understand and specify active database 
triggers. TBE retains benefits of QBE while extending features to sup- 
port triggers. Hence, TBE is a useful tool for novice users to create 
simple triggers in a visual and intuitive manner. Further, since TBE is 
designed to hide the details of underlying trigger systems from users, it 
can be used as a universal trigger interface. 

Keywords: Visual Query Interface, Triggers, Active Database 

1. INTRODUCTION 

Triggers provide a facility to autonomously react to database events 
by evaluating a data-dependent condition and by executing a reaction 
whenever the condition is satisfied. Such triggers are regarded as an 
important database feature and implemented by most major database 
vendors. Despite their diverse potential usages, one of the obstacles that 
hinder the triggers from their wide deployment is the lack of tools that 
aid users to create complex trigger rules in a simple manner. In many 
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environments, the correctness of the written trigger rules is very crucial 
since the semantics encoded in the trigger rules are shared by many 
applications. Although the majority of the users of triggers are DBAs 
or savvy end-users, writing correct and complex trigger rules is still a 
daunting task, not to mention maintaining written trigger rules. 

On the other hand, QBE (Query-By- Example) has been very popular 
since its introduction decades ago and its variants are currently being 
used in most modern database products. As it is based on the domain 
relational calculus, its expressive power is proved to be equivalent to 
that of SQL that is based on the tuple relational calculus (Codd, 1972). 
As opposed to SQL, which the user has to conform to the phrase struc- 
ture strictly, QBE user may enter any expression as an entry insofar 
as it is syntactically correct. That is, since the entries are bound to 
the table skeleton, the user can only specify admissible queries (Zloof, 
1977). We proposed TBE (Trigger-By-Example) (Lee et ah, 99) as a 
novel graphical interface for writing triggers. Since most trigger rules 
are complex combinations of SQL statements, by using QBE as a user 
interface for triggers the user may create only admissible trigger rules. 
TBE uses QBE in a declarative fashion for writing the procedural trigger 
rules (Cochrane et ah, 1996). In this paper, we discuss the design and 
implementation issues of TBE. Further, our design to make TBE a uni- 
versal trigger rule formation tool that hides much of the peculiarity of 
the underlying trigger systems is presented. 

To facilitate discussion, we shall briefly remind SQL3 triggers and 
QBE in the following subsections. 

1.1. SQL3 TRIGGERS 

In SQL3, triggers^ sometimes called event-condition-action rules or 
EGA rules^ mainly consist of three parts to describe the event, condition, 
and action, respectively. Since SQL3 is still evolving at the time of 
writing this paper, albeit close to its flnalization, we base our discussion 
on the latest ANSI X3H2 SQL3 working draft (Melton, 1999). The 
following is a deflnition of SQL3: 

Example 1: SQL3 triggers definition. 

<SQL3-trigger> ::= CREATE TRIGGER <trigger-name> 

{AFTER I BEFORE} <trigger-event> ON <table-name> 

[REFERENCING <references>] 

[FOR EACH {ROW | STATEMENT}] 

[WHEN <SQL-statements>] 

<SQL-procedure-statements> 

<trigger-event> ;:= INSERT | DELETE | UPDATE [OF <column-names>] 
<reference> OLD [AS] <old-value-tuple-name> | 
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NEW [AS] <new-value-tuple-name> | 

OLD_TABLE [AS] <old-value-table-name> | 

NEW-TABLE [AS] <new-value-table-name> 

1.2. QBE (QUERY-BY-EXAMPLE) 

QBE is a query language as well as a visual user interface. In QBE, 
programming is done within two-dimensional skeleton tables. This is ac- 
complished by filling in an example of the answer in the appropriate table 
spaces (thus the name “by-example”). Another kind of two-dimensional 
object is the condition box^ which is used to express one or more desired 
conditions difficult to express in the skeleton tables. By QBE conven- 
tion, variable names are lowercase alphabets prefixed with system 
commands are uppercase alphabets suffixed with and constants are 
denoted without quote unlike SQL3. Let us see a QBE example. The 
following schema is used throughout the paper. 

Example 2: Define the emp and dept relations with keys underlined. emp.DeptNo 
and dept.MgrNo are foreign keys referencing to dept.Dno and emp.Eno attributes, 
respectively. 

emp(Enp, Ename, DeptNo, Sal) 

deptf Pno , Dname, MgrNo) 

Then, Example 3 shows two equivalent representations of the query in 
SQL3 and QBE. 

Example 3: Who is being managed by the manager ’Tom’? 

SELECT E2. Ename 

FROM emp El, emp E2, dept D 

WHERE El. Ename = ’Tom’ AND El.Eno = D. MgrNo AND E2. DeptNo = D.Dno 




The rest of this paper is organized as follows. Section 2 gives a brief 
introduction to TBE. Section 3 is a simulation of a user session with 
TBE. The design and implementation of TBE is discussed in Section 4. 
Section 5 presents the design of some extensions that we are planning for 
the TBE. Related work and concluding remarks are given in Sections 6 
and 7, respectively. 

2 . TBE (TRIGGER-BY-EXAMPLE) 

Triggers in SQL3 are procedural in nature. As shown in Example 1, 
trigger actions can be arbitrary SQL procedural statements. Also, the 
order among action statements needs to be obeyed faithfully to preserve 
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the correct semantics. On the contrary, since QBE is a declarative query 
language, the order is immaterial. Further, QBE is specifically designed 
as a tool for only: 1) data retrieval queries (i.e., SELECT), 2) data 
modification queries (i.e., INSERT, DELETE, UPDATE), and 3) schema 
definition and manipulation queries. Thus, our goal is to develop a tool 
that can represent the procedural SQL3 triggers in its entirety while 
retaining the declarative nature of QBE as much as possible. 

2.1. TBE MODEL 

SQL3 triggers use the EGA (Event, Condition and Action) model. 
Therefore, triggers are represented by three independent E, C, and A 
parts. In TBE, each E, C, and A part maps to the corresponding skeleton 
tables and condition boxes separately. To differentiate among these three 
parts, each skeleton table name is prefixed with its corresponding flags, 
E., C., or A.. The condition box in QBE is extended similarly. For 
instance, a trigger condition statement can be specified in the C . prefixed 
skeleton table and/or condition box. 




SQL3 triggers allow only INSERT, DELETE, and UPDATE as legal 
event types. QBE uses I., D., and U. to describe the corresponding 
data manipulations. TBE thus uses these constructs to describe the 
trigger event types. Since INSERT and DELETE always affect the whole 
tuple, not individual columns, I. and D. must be filled in the leftmost 
column of the skeleton table. Since UPDATE event can aflFect individual 
columns, U. must be filled in the corresponding columns. Otherwise, 
U. is filled in the leftmost column to represent that UPDATE event is 
monitored on all columns. Consider the following example. 

Example 4: Skeleton tables (1) and (2) depict INSERT and DELETE events on 
the dept table, respectively. (3) depicts UPDATE event of columns Dname and MgrNo. 
Thus, changes occurring on other columns do not fire the trigger. (4) depicts UPDATE 
event of any columns on the dept table. 




Note also that since SQL3 triggers definition requires that one trigger 
rule monitors only one event, there cannot be more than one row having 
an I., D., or U. flag. Therefore, the same trigger action for different 
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events (e.g., “abort when either INSERT or DELETE occurs”) needs to 
be expressed as separate trigger rules in SQL3 triggers. 

2.2. TRIGGERS ACTIVATION TIME AND 
GRANULARITY 

The SQL3 triggers have notions of event activation time and granu- 
larity. Event activation time specifies whether the trigger is executed 
before or after its event. Granularity defines how many times the trigger 
is executed for a particular event. 

1 The activation time can have two modes, before and after. The 
before mode triggers execute before their events and are useful for 
conditioning of the input data. The after mode triggers execute af- 
ter their events and are typically used to embed application logic 
(Cochrane et ah, 1996). In TBE, two corresponding constructs 
(BFR. and AFT.) are introduced to denote these modes. The ap- 
pended denotes that these are built-in system commands by 
QBE convention. 

2 The granularity of a trigger can be specified as either a for each 
row or for each statement^ referred to as row-level and statement- 
level triggers, respectively. The row-level triggers are executed 
once for each modification to tuple whereas the statement-level 
triggers are executed once for an event regardless of the number 
of tuples affected. In TBE notation, R. and S. are used to denote 
the row-level and statement-level triggers, respectively. 

Users express trigger activation time and granularity at the leftmost 
column of the event skeleton tables using the introduced constructs. 

2.3. TRANSITION VALUES 

When an event occurs and values change, trigger rules often need to 
refer to the before and after values of certain attributes. These values 
are referred to as the transition values. In SQL3, these transition values 
can be accessed by either transition variables (i.e., OLD, NEW) for row- 
level triggers or tables (i.e., OLD .TABLE, NEW .TABLE) for statement-level 
triggers. Furthermore, in SQL3, the INSERT event trigger can only 
use NEW or NEW.TABLE while the DELETE event trigger can only use 
OLD or OLD.TABLE to access transition values. However, the UPDATE 
event trigger can use both transition variables or tables. In TBE, a cou- 
ple of special built-in functions (i.e., OLD.TABLE () and NEW.TABLE () for 
statement-level, 0LD() and NEW() for row-level) are introduced. The 
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OLD_TABLE() and NEW_TABLE() functions return a set of tuples with val- 
ues before and after the changes, respectively. Similarly the 0LD() and 
NEW() functions return a single tuple with values before and after the 
change, respectively. Therefore, applying aggregate functions such as 
CNT. or SUM. to 0LD() or NEW() is meaningless (i.e., CNT.NEW(_s) is 
always 1 and SUM.0LD(_s) is always same as _s). Using these built-in 
functions, for instance, an event ^^every time more than 10 new employ- 
ees are inserted” can be represented as follows: 



E.emp 

AFT.I.S. 



Eno 



Ename 



DeptNo 



Sal 



_n 



E. conditions 

CNT.ALL.NEW_TABLE(-n) > 10 



When arbitrary SQL procedural statements (i.e., IF, CASE, assignment 
statements, etc.) are written in the action part of the trigger rules, it 
is not straightforward to represent them in TBE due to their procedural 
nature. Because their expressive power is beyond what the declarative 
QBE (thus TBE described so far) can achieve, we instead provide a spe- 
cial kind of box, called a statement box^ similar to the condition box. 
The user can write arbitrary SQL procedural statements delimited by 
in the statement box. Since the statement box is only allowed for 
the action part of the triggers, the prefix A. is always prepended. An 
example is: 

A. statements 
IF (X > 10) 

ROLLBACK; 



2.4. TBE EXAMPLES 



Let us wrap up this section with two illustrating examples. These are 
typical trigger rules to maintain database integrity constraints. 

Example 5: When a manager is deleted, all employees in his or her department are 
deleted too. 



CREATE TRIGGER ManagerDelRule AFTER DELETE ON emp 
FOR EACH ROW 

DELETE FROM emp E WHERE E.DeptNo IN 

(SELECT D.Dno FROM dept D WHERE D.MgrNo = OLD.Eno) 



E.emp 

AFT.D.R. 



Eno 

_e 



Ename 



DeptNo I Sal 



A. dept 


Dno 


□name 


MgrNo 1 A. emp 


Eno 


Ename 


DeptNo 


Sal 




! -d 


1 1 


1 -e D. 


1 1 


1 1 


1 -d 1 


1 1 



In this example, the WHEN clause is missing on purpose. That is, the trigger rule does 
not check if the deleted employee is in fact a manager or not because the rule deletes 
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only the employee whose manager is just deleted. Note how the _e variable is used to 
join the emp and dept tables to find the department whose manager is just deleted. 
The same query could have been written with a condition test in a more explicit 
manner as follows: 



E.emp 


Eno 


Ename 


DeptNo 


Sal 


C.dept 


Dno 


□name 


MgrNo 



AFT.D.R. 1 _e | 


1 1 


1 “ 


1 


_d 1 


1 


_m 1 


1 C. conditions 


A. emp 


Eno 


Ename 


DeptNo 


Sal 




OLD(-e) = -m 


D. 1 






_d 1 







Example 6: When employees are inserted to the emp table, abort the transaction if 
there is one violating the foreign key constraint. 

CREATE TRIGGER AbortEmp AFTER INSERT ON emp 
FOR EACH STATEMENT 

WHEN EXISTS (SELECT * FROM NEW.TABLE E WHERE NOT EXISTS 

(SELECT * FROM dept D WHERE D.Dno = E.DeptNo)) 

ROLLBACK 





E.emp 1 Eno 


1 Ename 


1 DeptNo 


Sal 1 




AFT.I.S. 1 


1 


1 -d 


1 1 


C.dept 


1 Dno 


Dname 


MgrNo 


1 [ 


A. statements 


— > 


1 1 






1 [ 


ROLLBACK 



In this example, if the granularity were R. instead of S., then TBE would generate 
slightly different SQL3 trigger rule as shown below. That is, a row-level trigger rule 
generated from the same TBE representation would have been: 

CREATE TRIGGER AbortEmp AFTER INSERT ON emp 
FOR EACH ROW 

WHEN NOT EXISTS (SELECT * FROM dept D WHERE D.Dno = NEW.DeptNo) 
ROLLBACK 

Please refer to (Lee et al., 99) for detailed discussion and more examples 
of TBE. 

3. A TBE SESSION EXAMPLE 

To give a flavor of TBE, we describe a sample session in this section. 
Consider the following example. 

Example 7: When an employee’s salary is changed more than twice within the same 

year (a variable CURRENT.YEAR contains the current year value), record new values 
of Eno and Sal into the log( Eno . Sal) table. Assume that there is another table 
sal-change ( Eno, Year , Cnt) that keeps track of the employee’s salary changes. 
Without TBE, a human expert would have written the following trigger 
rule: 



CREATE TRIGGER TwiceSalaryRule 
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S^s- '■ii.ffi. 
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AFTER UPDATE OF Sal ON emp 
FOR EACH ROW 

WHEN EXISTS (SELECT * FROM sal-change WHERE 

Eno =: NEW.Eno AND Year = CURRENT.YEAR AND Cnt > = 
BEGIN ATOMIC 

UPDATE sal-change SET Cnt = Cnt -f 1 

WHERE Eno = NEW.Eno AND Year = CURRENT.YEAR 
INSERT INTO log VALUES(NEW.Eno, NEW.Sal); 



trl^g^r rul* 



Initial screen 



Initially, TBE looks like Figure 1. Descriptions on the panel are only 
added for explanation purposes. The main screen consists of two sections 
- one for input and the other for output. The input section is where the 
user creates trigger rules by a QBE mechanism and the output section 
is where the interface generates trigger rules in the target trigger syntax 
(default is SQL3). Further, the input section consists of three panels 
for event, condition, and action, respectively. The user first chooses the 
target system. Then, TBE adjusts its behavior according to the selected 
target system specifics. Current implementation supports only SQL3 
triggers. 

At its start-up time, TBE first loads schema information and keeps 
table, attribute, and type related information. This information is used 
to guide users to write only admissible trigger rules. For instance, when 
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the user tries to insert an empty skeleton table at one of the three panels, 
TBE shows all the available table names to aid in the user’s selection. 
After the user picks the table, an empty table appears in the currently 
active panel. 

In our example, the user creates the trigger event. From the query 
description, the user knows that the activation time and the granularity 
of the triggers are ^‘after” and “for each row”, respectively. Further- 
more, the Sal attribute needs to be monitored for the “update” event 
(Figure 2). All these commands are provided by TBE and can be chosen 
from the pop-up menu. 



E.emp 


Eno 


i Ename 


DeptNO 


! sal 1 


AFTR. I 




i 




lu, i 



Figure 2 Event construction. 

Next, the user constructs the trigger condition - “salary is increased 
more than twice within the same year”. To do this, the user can use the 
fact that “when an employee’s salary is updated, if the Cnt attribute of 
the sal-change of the same person has value greater than or equal to 
2 within the same year, then his update event satisfies the condition” . 
Since the emp table needs to be joined with the sal-change table to find 
the candidate employees, the user put variable _n in the key attribute 
(i.e., Eno) of the emp table. (Figure 3). 



E.emp 


1 Eno 


Eriame 


OeptNo. 1 


Sal 1 


AFTR, 


Ln 


iu 



Figure 3 A variable inserted at key attribute. 

In the sal-change table, to specify the same year, CURRENT.YEAR is 
inserted at Year attribute. In addition, to refer to the Cnt value later, a 
new variable _c is inserted. Finally, the join condition between emp and 
sal-change tables is expressed by entering the variable _n in the Eno 
attribute of the sal-change table (i.e., equi-join). After constructing 
“changed more than twice” phrase using the special condition box^ The 
resulting TBE is shown in Figure 4. 

To facilitate user rule specification, TBE provides the user with all the 
valid context-sensitive options available for the user to select. For in- 
stance, when the user right-clicks after positioning the cursor in the Eno 
attribute, a pop-up menu appears (Figure 5). 

Now, the user constructs the trigger action. Two actions are required 
according to the query description: 1) system maintains Cnt value in 
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Figure 4 Condition construction. 




Figure 5 Pop-up menu. 



the sal-change, and 2) system logs the information of the employee 
whose salary has been changed more than twice within the same year. 
Since two actions operate on different tables, the user creates two empty 
skeleton tables at the event panel. Then, using the variable _n defined 
in the emp table, the user increases the Cnt value by one (Figure 6). 





IB no -y-ft 


.'Year 






J : 




_c + 1 





Ln 


CURRENT_YEAR 


_c 



Figure 6 Action construction for the sal-change table. 

Second, the user needs to insert his employee number and his new 
salary into the log table. The user enters another variable in the Sal 
attribute of the emp table to refer to the employee’s salary value. Fur- 
thermore, to retrieve a new salary value after an update, the user uses 
the NEW() function explicitly (Figure 7). 

Finally, after the user clicks the down- arrow button to generate the 
SQL3 trigger rule, the corresponding rule in SQL3 triggers syntax is 
generated at the output section. Figure 8 shows the final screen after 
rule generation. 
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Figure 1 Action construction for the log table. 
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Figure 8 Final screen. 

4. DESIGN AND IMPLEMENTATION 
ISSUES 

In this section, we discuss some of the interesting aspects of the TBE 
implementation. A preliminary version of TBE prototype is being im- 
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plemented in Java using jdk 1.2.1 and swing 1.1. The main issues that 
we encountered in designing and implementing TBE are: 

■ How to represent TBE internally? 

■ How to implement the translation algorithm? 

4.1. INTERNAL REPRESENTATION 

Each of the three panels in the GUI (event, condition, and action) 
holds a vector of tables as created by the user. Before passing the vectors 
to the translation module, the GUI processes sets (i.e., ]” notation in 

QBE), removing bracketed entries and replacing them with constants 
and simple example elements. The modified tables are then used to 
create internal representations of the tables for the translation module 
(called TBETables). It contains the column header and a vector of non 
empty fields. Other useful information such as the fields row and column 
are stored as well. 

The whole session of TBE can be stored on disk using Java’s seri- 
alization feature. Therefore, current implementation uses the TBETable 
as an in-memory representation while the serialized object as an on-disk 
representation of TBE. 

For each clause and various checks in the translation algorithm, a 
linear iteration through the TBETables is required. That is, every time 
a scan costs 0{N ^ M), where N is the total number of rows in all 
TBETables and M is the average number of non-empty fields in the 
rows. Since the size of trigger rule is relatively small, this is not a 
serious performance problem. One might minimize the constant factor 
by performing multiple tasks through iterations, but this comes as a cost 
to modularity. 

4.2. TRANSLATION ALGORITHM 

Our algorithm is an extension of the algorithm by (McLeod, 1976), 
which translates from QBE to SQL. Its input is a list of skeleton ta- 
bles and the condition boxes while its output is a SQL query string. 
Let us denote the McLeod’s algorithm as qbe2sql(<input>) and ours as 
tbe2triggers. 

4.2.1 The qbe2sql Algorithm. We have implemented basic 
features of the qbe2sql algorithm in (McLeod, 1976), except queries hav- 
ing the GROUP-BY construct. The algorithm first determines the type 
of query statement. The basic cases involve operators, such as SELECT, 
UPDATE, INSERT, and DELETE. Special cases use UNION, EXCEPT, 
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and INTERSECT where the statements are processed recursively. Gen- 
eral steps of the translation implemented in TBE are as follows: 

1 Duplicate tables are renamed, (e.g., “FROM supply, supply” is 
converted into “FROM supply SI, supply S2”) 

2 SELECT clause (or other type) is printed by searching through 
TBETables’ fields for projection (i.e., P. command). Then, FROM 
clause is printed from TBETable table names. 

3 Example variables are extracted from TBETables by searching for 
tokens starting with Variables with same names indicate table 
joins; table names and corresponding column names of the vari- 
ables are stored. 

4 Process conditions; variables are matched with previously extracted 
variables and replaced with corresponding table and column names. 

(e.g., a variable _n at column Eno of the table emp is replaced to 
emp.Eno). Constants are handled accordingly as well. 

4.2.2 The tbe2triggers Algorithm. Let us assume that jvar is 
an example variable filled in some column of the skeleton table, colname (_t?ar) 
is a function to return the column name given the variable name Jvar. 
Skeleton tables and condition or statement boxes are collectively called 
as entries. 

1 Preprocessing: This step does two tasks: 1) reducing TBE query to 
an equivalent, but simpler form by moving the condition box en- 
tries to the skeleton tables, and 2) partitioning the TBE query into 
distinct groups when multiple trigger rules are written together. 

This can be done by comparing variables filled in the skeleton ta- 
bles and collecting those entries with the same variables being used 
in the same group. Then, apply the following steps 2, 3, and 4 to 
each distinct group repeatedly to generate separate trigger rules. 

2 Build event clause: Input all the E. prefixed entries. The “CREATE 
TRIGGER <trigger-name>” clause is generated by the trigger name 
<trigger-name> filled in the name box. By checking the constructs 
(e.g., AFT. , R. ), the system can determine the activation time and 
granularity of the triggers. The event type can also be detected 
by constructs (e.g., I., D., U.). If U. is found in the individual 
columns, then the “AFTER UPDATE OF <column-names>” clause is 
generated by enumerating all column names in an arbitrary order. 
Then, 
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(a) Convert all variables juari used with I . event into NEW(_?;an) 

(if row- level) or NEW_TABLE(_i»arJ (if statement-level) accord- 
ingly. 

(b) Convert all variables -vavi used with D . event into OLD 

(if row-level) or OLD .TABLE (if statement-level) accord- 

ingly. 

(c) If there is a condition box or a column having comparison 
operators (e.g., <, >) or aggregation operators (e.g., AVG., 

SUM.), gather all the related entries and pass them over to 
step 3. 

3 Build condition clause: Input all the C . prefixed entries as well as 
the E. prefixed entries passed from the previous step. 

(a) Convert all built-in functions for transition values and aggre- 
gate operators into SQL3 format. For instance, OLD(-var) 
and SUM. _^;ar are converted into OLD. name and SUM (name) 
respectively, where name == colname (_?;ar) . 

(b) Fill P . command in the table name column (i.e., leftmost one) 
of all the C . prefixed entries unless they already contain P . 
commands. This will result in creating “SELECT tablei.*, 

. . . , tablcn . ♦ FROM tablei , . . . , tablcn^ clause. 

(c) Gather all entries into <input> list and call qbe2sql(<input>) 
algorithm. Let the returned SQL string as < condit ion-statement >. 
For row-level triggers, create “WHEN EXISTS (<condition-statement>) 
clause. For statement-level triggers, create “WHEN EXISTS 
(SELECT * FROM NEW.TABLE (or OLD.TABLE) WHERE (< condit ion- 
statement >))” 

4 Build action clause: Input all the A. prefixed entries. 

(a) Convert all built-in functions for transition values and aggre- 
gate operators into SQL3 format like step 3. (a). 

(b) Partition the entries into distinct groups. That is, gather 
entries with identical variables being used in the same group. 

Each group will have one data modification statement such as 
INSERT, DELETE, or UPDATE. Preserve the order among 
partitioned groups. 

(c) For each group call qbe2sql(< Gi >) algorithm accord- 
ing to the order in step 4.(b). Let the resulting SQL string 
for Gi as <action-statement>^. The contents in the state- 
ment box are literally copied to < action-statement Then, 
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final action statements for triggers would be ‘^BEGIN ATOMIC 
<action-statement>i; <action-statement>^; END”. 

5. TBE AS A UNIVERSAL TRIGGER RULE 
FORMATION TOOL 

At present, TBE supports only SQL3 triggers syntax. Although SQL3 
is close to its final form, many database vendors are already shipping 
their products with their own proprietary triggers syntax. When mul- 
tiple databases are interconnected or integrating one database to an- 
other, these diversities can introduce significant problems. To remedy 
this problem, one can use TBE as a universal triggers construction tool. 
The user can create trigger rules using TBE interface and saves them as 
TBE’s internal format. When there is a need to change one database to 
another, the user can reset the target system (e.g., from Oracle to DB2) 
to re-generate new trigger rules. 

Ideally, we like to be able to add new types of database triggers in a 
declarative fashion. That is, given a new triggers system, a user needs 
only to describe what kind of syntax the triggers use. Then, TBE should 
be able to generate the target trigger rules without further intervention 
from the user. Two inputs to TBE are needed to add new database 
triggers: trigger syntax rule and trigger composition rule. In a trigger 
syntax rule, a detailed description of the syntactic aspect of the triggers 
is encoded by the declarative language. In a trigger composition rule, 
information as to how to compose the trigger rule (i.e., English sentence) 
using the trigger syntax rule is specified, the behavior and output of TBE 
conforms to the specifics defined in the meta rules of the selected target 
trigger system. When a user chooses the target trigger system in the 
interface, corresponding trigger syntax and composition rules are loaded 
from the meta rule database into TBE system. The high-level overview 
is shown in Figure 9. 

5.1. TRIGGER SYNTAX RULE 

TBE provides a declarative language to describe trigger syntax, whose 
EBNF is shown below: 

<Trigger-Syntax-Rule> <event-rule> | <condition-rule> | <action-rule> 

< event-rule > 'event' 'has' <event-rule-entry> <event-rule-entry>)* 
<event-rule-entry> <structure-operation> 'on' ('row' | 'attribute') | 

<activation-time> | <granularity> | <evaluation-time> 

<structure-operation> ('!.' [ 'D.' | 'U.' | 'RT.') 'as' <value> 

<activation-time> ('BFR.' | 'AFT.' | 'ISTD.') 'as' <value> 

<granularity> ('R.' | 'S.') 'as' <value> 
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Figure 9 The architecture of TBE as a universal triggers construction tool. 
<value> <identifier> | ' <identifier> ' | 'null' [ 'true' 

< CQndition-rule > 'condition' 'has' <condition-rule-entry> (',' <condition-rule-entry>)* 
<condition-rule-entry> <condition-role> | <condition-context> 

<condition-role> ::== 'role' 'as' ('mandatory' | 'optional') 

<condition-context> 'context' 'as' 

'(’ ('NEW I 'OLD I 'NEW_TABLE | 'OLD_TABLE) 'as' <value> ')' 

< action-rule > 'action' 'has' <action-rule-entry> (',' <action-rule-entry>)* ';' 
<action-rule-entry> <structure-operation> [ <evaluation-time> 

<evaluation-time> ('DFR.' | 'IMM.' | 'DTC.') 'as' <value> 

Although the detailed discussion of the language constructs is be- 
yond the scope of this paper, the essence of the language has the form 
“command as value” , meaning the trigger feature command is supported 
and represented by the keyword value. For instance, a clause NEW_TABLE 
as INSERTED for Starburst system would mean that “Starburst supports 
statement-level triggering and uses the keyword INSERTED to access tran- 
sition values” . 

Example 8: SQL3 trigger syntax can be described as follows: 

event has ( 

I. as INSERT on row, D. as DELETE on row, U. as UPDATE on attribute, 
BFR. as BEFORE, AFT. as AFTER, R. as ROW, S. as STATEMENT 

) ; 

condition has ( 

role as optional, 

transition as (NEW as NEW, OLD as OLD, 

NEW.TABLE as NEW.TABLE, 0LD_TABLE as OLD.TABLE) 

) ; 

action has ( 

I. as INSERT, D. as DELETE, U. as UPDATE 

) ; 



/ SQL3 / / SQL3 
/syntax/ / comp, 
rule / / rule i 



/Oracle/ /Oracle/ 
'syntax/ / comp. / 
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The interpretation of this meta rule should be self-describing. For in- 
stance, the fact the there is no clause S . as ... implies that SQL3 
triggers do not support event monitoring on the selection operation. 
In addition, the clause T. as STATEMENT implies that SQL3 triggers 
support table-level event monitoring using the keyword ’FOR EACH 
STATEMENT’. 

The partial comparison of the trigger syntax of SQL3, Starburst, Post- 
gres, Oracle and DB2 system is shown in Table 1. Using the language 
constructs defined above, these syntax can be easily encoded into the 
trigger syntax rule. Note that our language is limited to the triggers 
based on EGA and relational data model. 



TBE 


SQL3 


Starburst 


Postgres 


Oracle 


DB2 


I. 


INSERT 


INSERTED 


INSERT 


INSERT 


INSERT 


D. 


DELETE 


DELETED 


DELETE 


DELETE 


DELETE 


U. 


UPDATE 


UPDATED 


UPDATE 


UPDATE 


UPDATE 


RT. 


N/A 


N/A 


RETRIEVE 


N/A 


N/A 


BFR. 


BEFORE 


N/A 


N/A 


BEFORE 


BEFORE 


AFT. 


AFTER 


true 


true 


AFTER 


AFTER 


ISTD. 


N/A 


N/A 


INSTEAD 


N/A 


N/A 


R. 


ROW 


N/A 


TUPLE 


ROW 


ROW 


s. 


STATEMENT 


true 


N/A 


true 


STATEMENT 


NEW 


NEW 


N/A 


NEW 


NEW 


NEW 


OLD 


OLD 


N/A 


CURRENT 


OLD 


OLD 


NEW_TABLE 


NEW-TABLE 


INSERTED, 

NEW-UPDATED 


N/A 


N/A 


NEW-TABLE 


OLD-TABLE 


OLD-TABLE 


DELETED, 

OLD-UPDATED 


N/A 


N/A 


OLD-TABLE 



Table 1 Syntax comparison of five triggers using the trigger syntax rule. The leftmost 
column contains TBE commands while other columns contain equivalent keywords of 
the corresponding trigger system. “N/A” means the feature is not supported and 
“true” means the feature is supported by default. 



5.2. TRIGGER COMPOSITION RULE 

After the syntax is encoded, TBE still needs information on how to 
compose English sentences for trigger rules. This logic is specified in the 
trigger composition rule. In a trigger composition rule, a macro variable 
is surrounded by the $ sign and substituted with actual values during 
rule generation time. 

Example 9: The following is a SQL3 trigger composition rule: 

CREATE TRIGGER $trigger-‘name$ 

$activation-time$ $structure-operation$ ON $table$ 

FOR EACH $granularity$ 

WHEN $condition-statement$ 

BEGIN ATOMIC 

$action-statement$ 
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END 

In rule generation time, for instance, variable $activation-time$ is re- 
placed with value either BEFORE or AFTER since those two are only 
valid values according to the trigger syntax rule in Example 8. In ad- 
dition, variables $condition-statement$ and $action-statement$ are re- 
placed with statements generated by the translation algorithm in Sec- 
tion 4.2. 



6. RELATED WORK 

Past active database research has focused on active database rule lan- 
guages (Agrawal and Gehani, 1989), rule execution semantics (Cochrane 
et al., 1996), or rule management and system architecture issues (Simon 
and Kotz-Dittrich, 1995). In addition, research on visual querying has 
been done in traditional database research (Embley, 1989),(Zloof, 1977). 
To a greater or lesser extent, all these research focused on devising novel 
visual querying schemes to replace data retrieval aspects of SQL lan- 
guage. Although some have considered data definition aspects (Collet 
and Brunei, 1992) or manipulation aspects, none have extensively con- 
sidered the trigger aspects of SQL, especially from the user interface 
point of view. 

Other work (e.g., IFO 2 (Teisseire et al., 1994), IDEA (Ceri et al., 
1996) have attempted to build graphical triggers description tools, too. 
Using IF 02^ one can describe how different objects interact through 
events, thus giving priority to an overview of the system. Argonaut 
from the IDEA project (Ceri et al., 1996) focused on the automatic gen- 
eration of active rules that correct integrity violation based on declara- 
tive integrity constraint specification and active rules that incrementally 
maintain materialized views based on view definition. TBE, on the other 
hand, tries to help users directly design active rules with minimal learn- 
ing. 

Other than QBE skeleton tables, forms have been popular building 
blocks for visual querying mechanism as well. For instance, Embley, 
1989) proposes the NFQL as a communication language between humans 
and database systems. It uses forms in a strictly nonprocedural manner 
to represent query. Other work using forms focused on the querying 
aspect of the visual interface (Collet and Brunei, 1992). To the best of 
our knowledge, the only work that is directly comparable to ours is RBE 
(Chang and Chen, 1997). TBE is different from RBE in the following 
aspects: 



■ Since TBE is designed with SQL3 triggers in mind, it is capable 
of creating all the complex SQL3 trigger rules. Since RBE’s capa- 
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bility is limited to 0PS5-style production rules, it cannot express 
the subtle difference of the trigger activation time nor granularity. 



■ The implementation of RBE is tightly coupled with the underlying 
rule system and database so that it cannot easily support multiple 
heterogeneous database triggers. Since TBE implementation is a 
thin layer utilizing a translation from a visual representation to 
the underlying triggers, it is loosely coupled with the database. 

7. CONCLUSION 

In this paper, we presented the design and implementation of TBE, a 
visual trigger rule specification interface. QBE was extended to handle 
features specific to EGA trigger rules. TBEextends the visual querying 
mechanism from QBE and applies it to triggers construction application. 
Examples to demonstrate SQL3-based trigger rule generation procedure 
as well as the TBE to SQL3 trigger translation algorithm were given. 
Extensions to make TBE a universal trigger rule interface was also dis- 
cussed. For a trigger system 5 , we can declaratively specify the syntax 
mapping between TBE and 5 , so that we can use TBE not only as a trig- 
ger rule formation tool, but also a universal intermediary for translations 
between supported systems. 
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Abstract WebSA (Web Site Agent) is a web page recommendation system that 
helps users navigate efficiently within a web site. The system discovers 
the users’ access patterns by mining the raw data from a web log file. 
The user interface of WebSA presents the weighted recommendations in 
a separate frame of the web page. Two types of recommendations are 
given: one based on the user’s access patterns and the other based on 
the overall pattern of all the users of that web site. To support the visual 
interface and the discovery process, WebSA uses database technology to 
manage, query, update, and concurrently access the web log data. The 
bulk of the computations for mining and updating the information are 
done on a daily basis in advance. This method together with the fine 
timing of the database guarantees high performance, even when the raw 
access log data is in the order of 1 GB. 

Keywords; collaborative filtering, visual interface, web, data mining, access pattern, 
database 

1. INTRODUCTION 

To aid visitors in navigating within a web site, commonly used tech- 
niques include site maps, lists of topics, and general keyword search 
mechanisms. However, these approaches have some drawbacks. First, 
there could be several interactions required from visitors, such as loading 
the pages that host those search services. Visitors are also responsible 
for locating the target page by providing related keywords, or choosing 

* Research supported in part by the National Science Foundation CAREER Award IRI- 
9896052 and CISE Research Instrumentation Grant 9729878. 
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categories. Secondly, many search engines are relatively poor in filter- 
ing out the “noise”. Therefore, a large list of links could be generated. 
Occasionally, either the keywords are not found or irrelevant results are 
generated. 

Inf ormation filtering ^ whereby information is supplied to the user by 
means of a static query embodying the user’s preferences on a set of dy- 
namic documents [2], and in particular collaborative filtering have been 
used to recommend relevant documents to users. Collaborative filtering 
makes use of preferences of other people to predict the documents that 
may be of interest to a particular user [8, 14, 15]. 

As summarized by Paul Resnick at the Collaborative Filtering Work- 
shop:^ 

“Guiding people’s choices of what to read, what to look at, what to 
watch, what to listen to (the filtering part); and doing that guidance 
based on information gathered from some other people (the collaborative 
part).” 

In this paper, we explore a particular collaborative filtering technique 
as embodied by the Web Site Agent (WebSA) that can be used to aid 
visitors in navigating within a web site. The WebSA system provides 
recommendations to web users through the use of collaborative filtering. 
Our approach has been deployed within the web site of the Computer 
Science Department of the Worcester Polytechnic Institute (herein ab- 
breviated as CSD-WPI). The approach can be easily transported to any 
other web site, provided that the access log for that site is made avail- 
able. 

To improve the quality of collaborative filtering services, one has to 
design an effective data mining strategy to automate information discov- 
ery from the raw data and to be able to supply information to the users 
in a timely fashion. To meet these requirements, we deploy database 
technology to manage, query, update, and concurrently access the web 
log data. The bulk of the computations for updating and mining the 
information are done on a daily basis in advance. This method together 
with the fine tuning of the database guarantees high performance, even 
when the raw access log data is in the order of 1 GB. 

This paper is organized as follows. In Section 2 we mention related 
work and compare our approach with those more closely related to it. 
Section 3 describes the overall architecture of WebSA. Section 4 de- 
scribes the raw data, the mining algorithm, and the relational schema 
used. The design of the user interface and of a preliminary usability 
study is discussed in Section 5. Implementation details encompassing 



^ http:/ / www.sims.berkeley.edu/resources/collab/collab-report.htmI. 
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performance issues, tools used, and solutions to specific problems are 
outlined in Section 6. Finally, we summarize and present directions for 
future work. 

2. COMPARISON WITH RELATED WORK 

Software tools sometimes called “agents” are being deployed for speed- 
ing up the information retrieval process on the web in general [1, 3, 4, 6, 
9, 10, 11, 16, 17] and within individual web sites in particular [7, 12, 13]. 

Such agents issue recommendations by modeling the user’s browsing 
progress incorporating heuristics to model the user’s behavior [9]. In 
some systems, the user is also asked to give feedback on the retrieved 
documents [3, 4] in some cases for the purpose of training a learning 
algorithm [1]. 

Other agents concentrate on particular web sites, since they require 
knowledge of the users’ actions such as that captured in the log files for 
that web site [7, 12, 13]. For example, in [7] a user’s interest is inferred 
by the actions of the user on the web pages (such as following links out of 
that page, mouse and scrolling activity). The approach that is closer to 
our own (and has provided motivation to WebSA) is by Perkowitz and 
Etzioni [12, 13] who analyze in detail the links that are followed by the 
users and propose the “Adaptive Web Sites” approach, the name being 
derived from web sites whose organization changes to reflect the users’ 
preferences. They have realized that different users have distinct goals 
and that the original layout of pages in a web server site might hide the 
most important or frequently used pages in “unlikely” places, making it 
inconvenient for users to retrieve them. Another realization is that the 
web site designer’s original expectation of usage may be violated. Since 
a site is used in many ways in practice, it is hard for the designer of a 
site to cover every aspect in the initial development. 

To address these concerns, they propose web sites that automatically 
improve their organization and presentation by learning from visitor 
access patterns. The sites may adapt their presentation to satisfy indi- 
vidual users’ needs by observing their individual information, which is 
sometimes called customization or to satisfy overall users’ interactions, 
which is called optimization. 

In their paper, they propose to improve users’ information retrieval 
speed in a particular site by creating an index page, i.e., a page contain- 
ing links to a set of related pages, which are estimated to be the users’ 
most favorable pages. To achieve this, they present a cluster mining al- 
gorithm that takes Web server logs as input and outputs the contents of 
candidate index pages. The web access log is processed into visits, one 
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day counts as one visit. For each visit, the co-occurrence frequencies be- 
tween pages are computed and represented in a similarity matrix. Next, 
a graph is generated corresponding to the matrix. By finding cliques, 
or connected components in the graph, the graph is then divided into 
sub-graphs, where each sub-graph is considered as a cluster of related 
pages of a particular topic. For each cluster found, an index page is 
created. 

In our paper we mine the access logs in a different way by considering 
trees that model the users’ traversals of the site. New pages are rec- 
ommended ranked in decreasing order by the probability of those pages 
having been visited after a page currently being observed by the user was 
visited. Excluded from the recommended list are the links that appear 
on the current page. There are two types of recommendations. One of 
the recommendations is solely based on that particular user past pattern 
of access to that site. The other recommendation looks at all the users 
that have visited that site. In this way, new users to a site can follow 
the most popular traversal paths of that site, hoping to find relevant 
information fast. Notice that since the interface displays the name of 
the page, users can perform a simple “content-based filtering” on a small 
list of links. Returning users to the site, on the other hand, could trace 
back their previous steps. 

Another difference from the work Perkowitz and Etzioni [12] is that 
we do not change the organization of the site or of each individual page. 
Both approaches have potential merits and drawbacks. Without exten- 
sive user studies, it is difficult to exactly evaluate the two approaches. 
However, it is well-known that consistency is one of the golden rules of 
user interface design. The reorganization of a slowly evolving web site 
could result in unwanted inconsistency. On the other hand, frequently 
updated web sites, such as those that provide news, sports events, and 
stock quotes might benefit from constant rearrangement based on overall 
users’ interest. 

3. WEBSA OVERVIEW 

The user interface that we propose for WebSA is a simple modification 
of the current web pages of the CSD-WPI web site. WebSA delivers a 
recommendation service in a separate frame as illustrated in Figure 1. 

Our “raw” data to analyze comes from the web server logs, which is 
basically “semi-structured” data. For the purpose of organizing the raw 
data in the logs and of increasing the performance of the data mining 
process, we use an Oracle database for managing the log data. The 
analysis of the visitors’ past accesses can be accomplished in two basic 
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Figure 1 The WebSA interface, as provided by a separate frame at the bottom of a 
browser page. 



ways: customization and optimization. In the former, the analysis will 
be done on an individual visitor’s past accesses, and its results will be 
presented only to that user. In the latter, past accesses for all visitors 
are analyzed. 

Both the customization and optimization phase aim at discovering the 
access patterns of the visitors, and based on the discovered patterns, a 
ranked list of links is produced. Such lists are generated in highest to 
lowest recommended order. Furthermore, additional information such as 
the hit frequency, hit probability and time spent on each of the recom- 
mendation links are provided as references to visitors. These references 
help visitors locate their desired sites in less time and view their past 
access statistics as well. 

All the heavy computation of these statistical values and recommen- 
dation links were done by a Java program in our server. Instead of pro- 
cessing all the computations at the time that the visits occur, WebSA 
discovers the access patterns in advance on a nightly basis. In this way, 
the list of recommendations can be directly retrieved from Oracle upon 
request, to save processing time. This information can be acquired from 
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the Oracle Database by a Java server program through the Java Data- 
base Connectivity (JDBC) facility. 

For the front end, there is a Common Gateway Interface (CGI) pro- 
gram, for each visitor using WebSA. This program gathers the necessary 
identification information from visitors, such as IP addresses and the 
URL of the visitors’ current site. These identification arguments will 
be delivered to the Java server program for retrieving the recommenda- 
tions. Furthermore, the CGI program outputs the recommended links 
and the statistic values to the visitor’s current site. 

An overview of the implemented architecture is shown in Figure 2. 





Figure 2 The WebSA Architecture 



4. DATA MINING 

In this section we describe the data mining process, starting from 
the analysis of the semi-structured log file, the extraction of the user 
patterns, and the database modeling of the data that is extracted and 
therefore made structured. 

4.1. LOG FILE DATA 

As mentioned previously, a way to provide useful recommendations 
is to analyze the users’ past access behavior. The access patterns of all 
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visitors to a site can be found in the web server’s log file. A partial view 
of the CSD-WPI web server log is shown in Figure 3. 
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Figure 3 A fragment of the accessJog file of the CSD-WPI web site. 



In the access log file, each line represents a transaction of transferring 
a file, which can be a HTML file, a picture image, a TXT format file, 
an executable file, or many others types of files. When a user accesses a 
web page at the CSD-WPI web site, one or more access lines are added 
to the end of the access log file, depending on how many files need to 
be loaded for forming that target page. Due to the nature of recording 
new accesses to the end of access log, the access log is already ordered 
sequentially by access time. Furthermore, each access line in the log is 
uniformly recorded. A decomposition of the fixed structure of each line 
is illustrated in Figure 4. There are cases for which some of the fields 
are unavailable: users may explicitly configure their web browser not 
to send out header information to the web server for privacy reasons, 
or sometimes instead of following links, users type in a URL to reach 
a page, or sometimes users reload a page. For each of the unavailable 
fields, its contents will be indicated with a character in the access 
log. 
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Figure The format of each access line in the CSD-WPPs web server log file. 



By carefully studying the access log’s properties, as described above, 
we consider only the helpful fields in each access line for our further 
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analysis, including IP, access time, current page, next page and browser 
fields. 

The IP field can be used to identify a user. We assume that each 
user will have a fixed IP address. This implies that a bias might occur 
in reflecting ones’ individual past access patterns, if they are sharing 
the same proxy server, computer, or dial in connection. Note that for 
dial-in accesses through an Internet Service Provider, a random IP is 
generated each time. We are targeting users of fixed computers with 
broadband connections. Alternatives to our approach, with their own 
drawbacks, are: storing a cookie in the users browser, in which case 
further analysis is precluded by users refusing cookies, or asking users 
to sign up, an extra step (which often involves the use of cookies) that 
makes the recommendation service less transparent and may therefore 
discourage certain users. 

The access time stamp field will allow us to calculate the duration a 
user has spent on a particular page. Since the access lines in the log file 
are ordered by time, we can simply compute the time difference between 
the entering and leaving time stamp of a page to figure out the duration 
of access to that page. From the current and next page fields in each 
access line, we can find out the hit frequency on each page in the past 
and their direct links to other pages. Furthermore, the hit probability of 
each page can be computed from the hit frequency and the paths users 
followed to get to that page by tracing the direct links from previously 
visited pages using a breadth-first search method. 

4.2. ACCESS PATTERN EXTRACTION 

In this project, a tree modeling method is deployed for extracting 
patterns in the otherwise seemingly unorganized collection of log data. 
For each IP address and each page, a tree is constructed to represent 
the possible paths from that page. An example of such a tree model is 
illustrated in Figure 5. 

The page being currently examined by the user is the root node of the 
tree. Each internal node in the tree represents a page that is at most 
five links away from the current page. 

The edges indicate direct links between two pages. For each internal 
node, the hit frequency, hit probability, time spent, and percentage of 
time spent are calculated. In the CSD-WPI web site, the average page 
contains more than ten links. Therefore, the tree may grow exponentially 
down each level. For storage space and retrieval performance concerns, 
only the top five levels of the tree are considered. Only pages at levels 
3, 4, and 5 are considered as recommendations to the users, since pages 
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Figure 5 A tree model of all possible paths from pagel.html, corresponding to the 
INDIVIDUAL2 table of Figure 6. 



in level 2 are direct links from the current page, which are just one click 
away. Therefore level 2 is excluded to avoid repetitive links within the 
original page. 

4.3. DATABASE MODELING 

Data is stored into the Oracle database after the mining operation 
from the access log. Figure 6 shows the tables in the Oracle database, 
which contain all the intermediate and final recommendation data. For 
simplicity, only the data for a single IP address is displayed (while the 
actual tables hold data of all IPs). The WEBLOG table stores all the 
useful fields from the access log, each tuple corresponds to one access 
line. Tuples in the WEBLOG table are sorted by time in ascending 
order, like the access lines in the access log file. The difference is that 
the WEBLOG table contains only accesses after filtering out the “noise” 
that results from requesting image files, direct access from outside the 
CSD domain and the CGI pages. 

The INDIVIDUAL! table combines all tuples with same IP, CurPage 
and NextPage in the WEBLOG table. The number of the occurrences 
of each of these tuples is stored in the Hits attribute. The HitProb 
attribute, which stands for hit probability, is computed by dividing the 
hit number of the CurPage to the NextPage by the total hit number from 
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Figure 6 Database tables of processed access logs, which contain all the intermediate 
and final recommendation data. 



CurPage to all other direct next pages. The values for the TimeSpent 
attribute are calculated by evaluating the time difference between time 
stamps for entering and leaving the NextPage in the WEBLOG table. 
If more than one access to that page is found, the all access times are 
combined. We have chosen to give a greater weight to the most recent 
access. Older access patterns will, in this way, be phased out more 
quickly. 

Both the WEBLOG and INDIVIDUAL! tables are used to store in- 
termediate values for computing the final results in INDIVIDUAL2 and 
OVERALL tables. INDIVIDUAL2 table contains the recommendation 
trees for each page an individual user has visited. An illustration of 
the tree model is shown in Figure 5. The contents in the RelatedPage 
attribute are the candidates of recommendation pages, as in the levels 
3, 4 and 5 in the tree model. 

In the INDIVIDUAL2 table, the Hits attribute can be directly ob- 
tained from INDIVIDUAL!, where the HitProb attribute is calculated 
by multiplying each node’s HitProb value with its parent nodes’ Hit- 
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Prob value from INDIVIDUAL!. This yields the hit probability on each 
internal node with respect to the current page, or root page of the tree. 
This algorithm is intended to avoid the possible bias produced by over- 
lapping trees, in which case the hit number is higher but does not reflect 
the users’ past choices starting from the current page. To make the sum 
of the hit probabilities from all candidate pages be 100%, the HitProb 
column is normalized. The TimeSpent column is fetched directly from 
the INDIVIDUAL! table by querying for the same IP and NextPage. 
The TimePercent, which stands for the percentage of time spent on a 
page, is calculated by dividing each TimeSpent value by the total Time- 
Spent of all the candidate pages in the tree in INDIVIDUAL2. 

The OVERALL table combines all IPs’ trees with the same cur- 
rent/root page. The Hits column is computed by summing all the Hits 
with the same CurPage and RelatedPage in INDIVIDUAL2. In the 
OVERALL table, the hit probability, the HitProb attribute, is based on 
the hit number instead of the percentage approach for INDIVIDUAL2’s 
HitProb column. For the OVERALL table, the HitProb column is cal- 
culated by dividing each node’s hit number by the total hit number of 
all candidate nodes in the tree. The TimeSpent for each page can be 
directly fetched from the INDIVIDUAL2 table, where as the time spent 
percentage can be obtained by dividing each node’s TimeSpent by the 
total time spent of all candidate nodes in the tree. 

5. USER INTERFACE AND PRELIMINARY 
USABILITY STUDY 

The WebSA’s user interface (see Figure !) is provided by a CGI appli- 
cation. It takes care of the overall layout, page design and the interaction 
between the user and WebSA. 

The first interface design rationale considered was to avoid layout 
modifications of the original web page design. Therefore, the display of 
the original page is not modified. The services of WebSA are provided 
in a separated frame of the browser window, which will keep the original 
structure and design intact, yet the services will always be presented to 
the users in the bottom frame. 

The second rationale of interface design is to minimize the space of 
the recommendation service in the browser, as browser space is lim- 
ited. Thus, instead of itemizing and listing all the recommendation, the 
CGI application of WebSA provides the recommendations in a pull-down 
menu. 

The third rationale of our interface design is to attract the user’s 
attention to the services provided by WebSA. This can be accomplished 
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with simple means, such as, background color choice, color scheme of 
the service page and also the icons and logo design. 

The last rationale and also the golden rule of interface design is the 
consistency of the interface. The frame layout is set by WebSA, where 
the height of the bottom frame is fixed and occupies the least possible 
space. The design of both service pages adopts the least variation, which 
will increase the users’ familiarity to the user interface of WebSA. 

Figure 7 shows the WebSA frame for the detailed service where: (1) 
both individual- and overall-based recommendations are provided; (2) 
for each of them, ten ranked links are given instead of the five links for 
the default service shown in Figure 1. 
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Figure 1 The WebSA detailed interface, as provided by a separate frame at the 
bottom of a browser page. 



We have conducted a preliminary user survey where ten users from 
CSD-WPI participated. 80% of the people surveyed were using fixed 
IP addresses accessing the Internet. Therefore most people could take 
full advantage of the personal recommendation service (based on their 
IP address). All of the people surveyed were accessing the CSD-WPI 
web pages as least 0.5-hour everyday so they were in a good position to 
compare the effectiveness of the service with regular browsing without 
the service. From all the data we collected, 69% of the users agree that 
WebSA is helpful or somewhat helpful for browsing the CSD-WPI web 
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sites, 78% feel the interface design of WebSA is intuitive, and 80% are 
satisfied with the performance of WebSA. 

6. IMPLEMENTATION 

6.1. PERFORMANCE ISSUES 

Performance was a major concern in designing WebSA. If users had 
to wait significantly for the services, they will probably prefer not to use 
them. Another objective was that the backend computations for updat- 
ing the service data should not take too long as not to unduly overload 
our Oracle server or local network. To eliminate the delay in delivering 
services to the users, we choose to pre-calculate service data on a daily 
basis. In this way, no major computations are performed while recom- 
mendations are provided to the users. To offer the recommendations, 
the CGI application simply queries the pre-computed results from the 
database and then displays them to users. The query process was made 
efficient, since appropriate indices are used that match the queries. 

The background process for preparing the service needs to be opti- 
mized too. The web log’s size is increasing at the rate of 2.6 megabytes 
per day. After a couple of months, the log file would be in the magnitude 
of hundreds of megabytes. Data mining for useful recommendations re- 
quires the repetitive retrieval of certain patterns from the large log file. 
These operations can be very expensive, since loading the log file will 
take up most of the memory in the operating system and will yield 
tremendous disk access, which is 1000 times slower than memory access 
on average. We used an Oracle database to store the web log data. 
Furthermore, we conveniently used the built-in utilities provided by the 
database, including indices for faster data retrieval and the concurrent 
retrieval synchronization feature. 

When doing database computations using Java, some special tech- 
niques were deployed. These techniques includes using prepared state- 
ments, sending a set of SQL commands in one transition to avoid over- 
head costs, and dropping indices before heavy modification on database 
tables and adding indices after finishing updating database tables for 
optimal retrieval services. 

The detailed architecture of WebSA is shown in Figure 8. 

6.2. SOFTWARE TOOLS 

For the database implementation we used an Oracle database server 
installed on an NT server. Configuration settings were adjusted in order 
to increase the performance, e.g., cache memory size, rollback buffer. 
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Figure 8 Functional model of WebSA 



etc. For the purposes of monitoring and administering the database, we 
also installed an Oracle 8 SQL client application on the development 
machine. 

The major development languages in our project are Java and Visual 
C++. Java is a very handy development language because it provides the 
developer with rich built-in libraries. Visual C++ was also used because 
it provides a very easy handle to sockets in the WIN32 environment. The 
communications to and from the Oracle database server are relying on 
the connection built with JDBC. We used the Xitami web server, which 
is lightweight and supports CGI. 

6.3. PROBLEMS AND SOLUTIONS 

In this section, we discuss some of the problems that arose during the 
implementation and the chosen solutions. 

The connection between the Query Server and CGI application is a 
TCP socket. However, these two applications are implemented using dif- 
ferent development languages. Due to the easy connection to the Oracle 
database with JDBC, the Query Server is written in Java, while the CGI 
application is written in C++ as we consider C++ a better CGI pro- 
gramming language than Java. The problem arises when the sockets are 
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created using different programming languages. The Java server socket 
and the WIN32 socket created by C++ do not work properly together. 
There were always some data-transfer errors in the communication. In 
order to guarantee the correctness of data transfer between these two 
sockets, some extra user defined control signals are added into the com- 
munication between client and server sockets. There is a parser sitting 
on both sides to check for the messages received, that request recovery 
if the corruption of the message is detected. And the start and end 
transfer signals are added to the message exchanges. The user-defined 
controls guarantee the socket communication on the TCP connection 
without noticing the difference of the internal mechanism. In retrospect 
(and for a future implementation), accessing the database directly with 
a CGI application would have been simpler. 

Another problem arose when the interface of WebSA was expanded to 
take arguments from different HTML formats. The arguments passed 
to the CGI application were input from direct argument parsing, i.e., 
filtered links, and form submission, i.e., recommendations from the pull- 
down menu of WebSA. This gave the CGI application a certain degree 
of difficulty to cope with the arguments. The direct argument parsing 
and form submission should be dealt differently, otherwise it will crash 
the CGI application. The solution for this problem is to set an extra 
argument checking in the CGI application. Once the CGI application 
performs the checking, there is a direct argument, which will be taken as 
the user request. If the direct argument is absent, the CGI application 
will then obtain the argument from the form submission through CGI 
environment variables. 

An interesting problem arises when users access the CSD-WPI web 
site through WebSA. In this case, the information of the IP address 
of the user’s local machine was lost. Also, due to the missing HTTP 
header information, which usually will not happen with browsers like 
Internet Explorer and Netscape Navigator, the user’s current page is 
not recorded into the CS accessJog file. Examining the lines recorded 
into the accessJog file, we can clearly identify the information loss when 
using WebSA: 

130.215.8.113 - - [17/Mar/1999:16:03:42 -0500] "GET / HTTP/1.0” 
200 5653 

The IP address field is always recorded as the IP address of our host 
(http://cleo.wpi.edu), which is the NT server machine of WebSA. 
The field of the currently browsed page and the browser information 
are therefore missed. This would, in the long term, affect the analysis 
performed by WebSA, especially if it became heavily used. The solution 
we used to resolve this problem was to submit additional HTTP header 
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information while requesting the document from the web server. The 
additional HTTP header information is included in the current page 
and the browser information. We also noticed that the CGI application 
has no way to keep track of the current web page’s URL. Therefore a 
new tag, <REFERRER>, is introduced for the specific use of WebSA. 
This tag is used to record the browsing page’s URL, for the reference of 
the CGI application. This tag avoids setting cookies on the user’s local 
machine. After performing such changes, the corresponding line in the 
accessJog of the above example is changed to 

130.215.8.113 - - [17/Mar/1999:16;03:42 -0500] "GET / HTTP/1.0" 
200 5653 "http://www.cs.wpi.edu/" "Mozilla/4.5 [en] (WinNT; I) 
&&130.215.8.113" 

The Data Server of WebSA was also adjusted in order to work with 
the extra piece of information appended at the end of the browser field 
of the accessJog file. 

7. CONCLUSIONS 

The goal of the Web Site Agent (WebSA) is to provide a recommen- 
dation service to help web users better browse sites according to the 
analysis of the data available from the access log file. The analysis takes 
into account the access probability for the pages of the site, with the 
page currently under examination as a starting point. The WebSA is 
dynamic in that it incorporates pattern changes on a daily basis. 

The WebSA interface, differs from the Adaptive Web Site inter- 
face [12, 13] in that the layout of the web site pages does not change 
according to the recommendations. From a human-computer interaction 
perspective, where consistency of the interface is one of the principal di- 
rectives, our approach seems more adequate, although studies would 
have to be conducted to confirm this, otherwise, ‘‘universal principle” in 
the context at hand. 

Providing WebSA with database support is also a contribution of our 
work. In this way, recommendations are answers to declarative queries 
that could be changed or extended easily. Moreover, database technology 
provides us with fast performance and concurrent access to the data, 
which can deliver fast performance even when the accumulated raw data 
is in the order of IGB. 

8. FUTURE WORK 

After collecting suggestions from our surveys and from colleagues, 
extensions to the current work have become apparent, which can be 
in some cases directly implemented given the data already collected. 
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Furthermore, the user interface can be extended to be more flexible and 
graphical. We group the directions for future work as follows: 

1 Some people’s access patterns might be totally different from oth- 
ers. Therefore, one could attempt to categorize users into commu- 
nities by examining the similarity between users’ access tree and 
provide recommendations based on the community the user ’’falls” 
in. The similarity of access trees can be evaluated, for example, 
by observing the number of nodes that are common to different 
users’ access trees. 

2 Instead of providing the recommended links in a flat listing, one 
could use a Java applet to produce a graphical representation of 
the access tree. The graphical access tree can be expanded further 
levels down by clicking on it, similarly to the hierarchy for the 
folders in the Windows Explorer application (which is available in 
Swing). Different user interface styles could be customizable to 
the specifications of the user [5]. 

3 WebSA does not support multi-framed pages. Multi-framed pages 
are currently not common in the CSD-WPI web site, but they 
might become common in the near future. One can extend our 
agent’s ability in this direction. 

4 Another possible direction for future work is to deploy WebSA in 
a larger, more complex, and less organized web site. This would 
be beneficial for further testing its robustness and usefulness. 
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